1PCREBUILD(3)               Library Functions Manual               PCREBUILD(3)
2
3
4

NAME

6       PCRE - Perl-compatible regular expressions
7

BUILDING PCRE

9
10       PCRE  is  distributed with a configure script that can be used to build
11       the library in Unix-like environments using the applications  known  as
12       Autotools.   Also  in  the  distribution  are files to support building
13       using CMake instead of configure. The text file README contains general
14       information  about  building  with Autotools (some of which is repeated
15       below), and also has some comments about building on various  operating
16       systems.  There  is  a lot more information about building PCRE without
17       using Autotools (including information about using CMake  and  building
18       "by  hand")  in  the  text file called NON-AUTOTOOLS-BUILD.  You should
19       consult this file as well as the README file if you are building  in  a
20       non-Unix-like environment.
21

PCRE BUILD-TIME OPTIONS

23
24       The  rest of this document describes the optional features of PCRE that
25       can be selected when the library is compiled. It  assumes  use  of  the
26       configure  script,  where  the  optional features are selected or dese‐
27       lected by providing options to configure before running the  make  com‐
28       mand.  However,  the same options can be selected in both Unix-like and
29       non-Unix-like environments using the GUI facility of cmake-gui  if  you
30       are using CMake instead of configure to build PCRE.
31
32       If  you  are not using Autotools or CMake, option selection can be done
33       by editing the config.h file, or by passing parameter settings  to  the
34       compiler, as described in NON-AUTOTOOLS-BUILD.
35
36       The complete list of options for configure (which includes the standard
37       ones such as the  selection  of  the  installation  directory)  can  be
38       obtained by running
39
40         ./configure --help
41
42       The  following  sections  include  descriptions  of options whose names
43       begin with --enable or --disable. These settings specify changes to the
44       defaults  for  the configure command. Because of the way that configure
45       works, --enable and --disable always come in pairs, so  the  complemen‐
46       tary  option always exists as well, but as it specifies the default, it
47       is not described.
48

BUILDING 8-BIT, 16-BIT AND 32-BIT LIBRARIES

50
51       By default, a library called libpcre  is  built,  containing  functions
52       that  take  string  arguments  contained in vectors of bytes, either as
53       single-byte characters, or interpreted as UTF-8 strings. You  can  also
54       build  a  separate library, called libpcre16, in which strings are con‐
55       tained in vectors of 16-bit data units and interpreted either  as  sin‐
56       gle-unit characters or UTF-16 strings, by adding
57
58         --enable-pcre16
59
60       to  the  configure  command.  You  can  also build yet another separate
61       library, called libpcre32, in which strings are contained in vectors of
62       32-bit  data  units and interpreted either as single-unit characters or
63       UTF-32 strings, by adding
64
65         --enable-pcre32
66
67       to the configure command. If you do not want the 8-bit library, add
68
69         --disable-pcre8
70
71       as well. At least one of the three libraries must be built.  Note  that
72       the  C++  and  POSIX  wrappers are for the 8-bit library only, and that
73       pcregrep is an 8-bit program. None of these are  built  if  you  select
74       only the 16-bit or 32-bit libraries.
75

BUILDING SHARED AND STATIC LIBRARIES

77
78       The  Autotools  PCRE building process uses libtool to build both shared
79       and static libraries by default. You  can  suppress  one  of  these  by
80       adding one of
81
82         --disable-shared
83         --disable-static
84
85       to the configure command, as required.
86

C++ SUPPORT

88
89       By  default,  if the 8-bit library is being built, the configure script
90       will search for a C++ compiler and C++ header files. If it finds  them,
91       it  automatically  builds  the C++ wrapper library (which supports only
92       8-bit strings). You can disable this by adding
93
94         --disable-cpp
95
96       to the configure command.
97

UTF-8, UTF-16 AND UTF-32 SUPPORT

99
100       To build PCRE with support for UTF Unicode character strings, add
101
102         --enable-utf
103
104       to the configure command. This setting applies to all three  libraries,
105       adding  support  for  UTF-8 to the 8-bit library, support for UTF-16 to
106       the 16-bit library, and  support  for  UTF-32  to  the  to  the  32-bit
107       library.  There  are no separate options for enabling UTF-8, UTF-16 and
108       UTF-32 independently because that would allow ridiculous settings  such
109       as  requesting UTF-16 support while building only the 8-bit library. It
110       is not possible to build one library with UTF support and another with‐
111       out  in the same configuration. (For backwards compatibility, --enable-
112       utf8 is a synonym of --enable-utf.)
113
114       Of itself, this setting does not make  PCRE  treat  strings  as  UTF-8,
115       UTF-16  or UTF-32. As well as compiling PCRE with this option, you also
116       have have to set the PCRE_UTF8, PCRE_UTF16  or  PCRE_UTF32  option  (as
117       appropriate) when you call one of the pattern compiling functions.
118
119       If  you  set --enable-utf when compiling in an EBCDIC environment, PCRE
120       expects its input to be either ASCII or UTF-8 (depending  on  the  run-
121       time option). It is not possible to support both EBCDIC and UTF-8 codes
122       in the same version of  the  library.  Consequently,  --enable-utf  and
123       --enable-ebcdic are mutually exclusive.
124

UNICODE CHARACTER PROPERTY SUPPORT

126
127       UTF  support allows the libraries to process character codepoints up to
128       0x10ffff in the strings that they handle. On its own, however, it  does
129       not provide any facilities for accessing the properties of such charac‐
130       ters. If you want to be able to use the pattern escapes \P, \p, and \X,
131       which refer to Unicode character properties, you must add
132
133         --enable-unicode-properties
134
135       to  the  configure  command. This implies UTF support, even if you have
136       not explicitly requested it.
137
138       Including Unicode property support adds around 30K  of  tables  to  the
139       PCRE  library.  Only  the general category properties such as Lu and Nd
140       are supported. Details are given in the pcrepattern documentation.
141

JUST-IN-TIME COMPILER SUPPORT

143
144       Just-in-time compiler support is included in the build by specifying
145
146         --enable-jit
147
148       This support is available only for certain hardware  architectures.  If
149       this  option  is  set  for  an unsupported architecture, a compile time
150       error occurs.  See the pcrejit documentation for a  discussion  of  JIT
151       usage. When JIT support is enabled, pcregrep automatically makes use of
152       it, unless you add
153
154         --disable-pcregrep-jit
155
156       to the "configure" command.
157

CODE VALUE OF NEWLINE

159
160       By default, PCRE interprets the linefeed (LF) character  as  indicating
161       the  end  of  a line. This is the normal newline character on Unix-like
162       systems. You can compile PCRE to use carriage return (CR)  instead,  by
163       adding
164
165         --enable-newline-is-cr
166
167       to  the  configure  command.  There  is  also  a --enable-newline-is-lf
168       option, which explicitly specifies linefeed as the newline character.
169
170       Alternatively, you can specify that line endings are to be indicated by
171       the two character sequence CRLF. If you want this, add
172
173         --enable-newline-is-crlf
174
175       to the configure command. There is a fourth option, specified by
176
177         --enable-newline-is-anycrlf
178
179       which  causes  PCRE  to recognize any of the three sequences CR, LF, or
180       CRLF as indicating a line ending. Finally, a fifth option, specified by
181
182         --enable-newline-is-any
183
184       causes PCRE to recognize any Unicode newline sequence.
185
186       Whatever line ending convention is selected when PCRE is built  can  be
187       overridden  when  the library functions are called. At build time it is
188       conventional to use the standard for your operating system.
189

WHAT \R MATCHES

191
192       By default, the sequence \R in a pattern matches  any  Unicode  newline
193       sequence,  whatever  has  been selected as the line ending sequence. If
194       you specify
195
196         --enable-bsr-anycrlf
197
198       the default is changed so that \R matches only CR, LF, or  CRLF.  What‐
199       ever  is selected when PCRE is built can be overridden when the library
200       functions are called.
201

POSIX MALLOC USAGE

203
204       When the 8-bit library is called through the POSIX interface  (see  the
205       pcreposix  documentation),  additional  working storage is required for
206       holding the pointers to capturing  substrings,  because  PCRE  requires
207       three integers per substring, whereas the POSIX interface provides only
208       two. If the number of expected substrings is small, the  wrapper  func‐
209       tion  uses  space  on the stack, because this is faster than using mal‐
210       loc() for each call. The default threshold above which the stack is  no
211       longer used is 10; it can be changed by adding a setting such as
212
213         --with-posix-malloc-threshold=20
214
215       to the configure command.
216

HANDLING VERY LARGE PATTERNS

218
219       Within  a  compiled  pattern,  offset values are used to point from one
220       part to another (for example, from an opening parenthesis to an  alter‐
221       nation  metacharacter).  By default, in the 8-bit and 16-bit libraries,
222       two-byte values are used for these offsets, leading to a  maximum  size
223       for  a compiled pattern of around 64K. This is sufficient to handle all
224       but the most gigantic patterns.  Nevertheless, some people do  want  to
225       process  truly  enormous patterns, so it is possible to compile PCRE to
226       use three-byte or four-byte offsets by adding a setting such as
227
228         --with-link-size=3
229
230       to the configure command. The value given must be 2, 3, or 4.  For  the
231       16-bit  library,  a  value of 3 is rounded up to 4. In these libraries,
232       using longer offsets slows down the operation of PCRE because it has to
233       load  additional  data  when  handling them. For the 32-bit library the
234       value is always 4 and cannot be overridden; the value  of  --with-link-
235       size is ignored.
236

AVOIDING EXCESSIVE STACK USAGE

238
239       When matching with the pcre_exec() function, PCRE implements backtrack‐
240       ing by making recursive calls to an internal function  called  match().
241       In  environments  where  the size of the stack is limited, this can se‐
242       verely limit PCRE's operation. (The Unix environment does  not  usually
243       suffer from this problem, but it may sometimes be necessary to increase
244       the maximum stack size.  There is a discussion in the  pcrestack  docu‐
245       mentation.)  An alternative approach to recursion that uses memory from
246       the heap to remember data, instead of using recursive  function  calls,
247       has  been  implemented to work round the problem of limited stack size.
248       If you want to build a version of PCRE that works this way, add
249
250         --disable-stack-for-recursion
251
252       to the configure command. With this configuration, PCRE  will  use  the
253       pcre_stack_malloc  and pcre_stack_free variables to call memory manage‐
254       ment functions. By default these point to malloc() and free(), but  you
255       can replace the pointers so that your own functions are used instead.
256
257       Separate  functions  are  provided  rather  than  using pcre_malloc and
258       pcre_free because the  usage  is  very  predictable:  the  block  sizes
259       requested  are  always  the  same,  and  the blocks are always freed in
260       reverse order. A calling program might be able to  implement  optimized
261       functions  that  perform  better  than  malloc()  and free(). PCRE runs
262       noticeably more slowly when built in this way. This option affects only
263       the pcre_exec() function; it is not relevant for pcre_dfa_exec().
264

LIMITING PCRE RESOURCE USAGE

266
267       Internally,  PCRE has a function called match(), which it calls repeat‐
268       edly  (sometimes  recursively)  when  matching  a  pattern   with   the
269       pcre_exec()  function.  By controlling the maximum number of times this
270       function may be called during a single matching operation, a limit  can
271       be  placed  on  the resources used by a single call to pcre_exec(). The
272       limit can be changed at run time, as described in the pcreapi  documen‐
273       tation.  The default is 10 million, but this can be changed by adding a
274       setting such as
275
276         --with-match-limit=500000
277
278       to  the  configure  command.  This  setting  has  no  effect   on   the
279       pcre_dfa_exec() matching function.
280
281       In  some  environments  it is desirable to limit the depth of recursive
282       calls of match() more strictly than the total number of calls, in order
283       to  restrict  the maximum amount of stack (or heap, if --disable-stack-
284       for-recursion is specified) that is used. A second limit controls this;
285       it  defaults  to  the  value  that is set for --with-match-limit, which
286       imposes no additional constraints. However, you can set a  lower  limit
287       by adding, for example,
288
289         --with-match-limit-recursion=10000
290
291       to  the  configure  command.  This  value can also be overridden at run
292       time.
293

CREATING CHARACTER TABLES AT BUILD TIME

295
296       PCRE uses fixed tables for processing characters whose code values  are
297       less  than 256. By default, PCRE is built with a set of tables that are
298       distributed in the file pcre_chartables.c.dist. These  tables  are  for
299       ASCII codes only. If you add
300
301         --enable-rebuild-chartables
302
303       to  the  configure  command, the distributed tables are no longer used.
304       Instead, a program called dftables is compiled and  run.  This  outputs
305       the source for new set of tables, created in the default locale of your
306       C run-time system. (This method of replacing the tables does  not  work
307       if  you are cross compiling, because dftables is run on the local host.
308       If you need to create alternative tables when cross compiling, you will
309       have to do so "by hand".)
310

USING EBCDIC CODE

312
313       PCRE  assumes  by  default that it will run in an environment where the
314       character code is ASCII (or Unicode, which is  a  superset  of  ASCII).
315       This  is  the  case for most computer operating systems. PCRE can, how‐
316       ever, be compiled to run in an EBCDIC environment by adding
317
318         --enable-ebcdic
319
320       to the configure command. This setting implies --enable-rebuild-charta‐
321       bles.  You  should  only  use  it if you know that you are in an EBCDIC
322       environment (for example,  an  IBM  mainframe  operating  system).  The
323       --enable-ebcdic option is incompatible with --enable-utf.
324
325       The EBCDIC character that corresponds to an ASCII LF is assumed to have
326       the value 0x15 by default. However, in some EBCDIC  environments,  0x25
327       is used. In such an environment you should use
328
329         --enable-ebcdic-nl25
330
331       as well as, or instead of, --enable-ebcdic. The EBCDIC character for CR
332       has the same value as in ASCII, namely, 0x0d.  Whichever  of  0x15  and
333       0x25 is not chosen as LF is made to correspond to the Unicode NEL char‐
334       acter (which, in Unicode, is 0x85).
335
336       The options that select newline behaviour, such as --enable-newline-is-
337       cr, and equivalent run-time options, refer to these character values in
338       an EBCDIC environment.
339

PCREGREP OPTIONS FOR COMPRESSED FILE SUPPORT

341
342       By default, pcregrep reads all files as plain text. You can build it so
343       that it recognizes files whose names end in .gz or .bz2, and reads them
344       with libz or libbz2, respectively, by adding one or both of
345
346         --enable-pcregrep-libz
347         --enable-pcregrep-libbz2
348
349       to the configure command. These options naturally require that the rel‐
350       evant  libraries  are installed on your system. Configuration will fail
351       if they are not.
352

PCREGREP BUFFER SIZE

354
355       pcregrep uses an internal buffer to hold a "window" on the file  it  is
356       scanning, in order to be able to output "before" and "after" lines when
357       it finds a match. The size of the buffer is controlled by  a  parameter
358       whose default value is 20K. The buffer itself is three times this size,
359       but because of the way it is used for holding "before" lines, the long‐
360       est  line  that  is guaranteed to be processable is the parameter size.
361       You can change the default parameter value by adding, for example,
362
363         --with-pcregrep-bufsize=50K
364
365       to the configure command. The caller of pcregrep can, however, override
366       this value by specifying a run-time option.
367

PCRETEST OPTION FOR LIBREADLINE SUPPORT

369
370       If you add
371
372         --enable-pcretest-libreadline
373
374       to  the  configure  command,  pcretest  is  linked with the libreadline
375       library, and when its input is from a terminal, it reads it  using  the
376       readline() function. This provides line-editing and history facilities.
377       Note that libreadline is GPL-licensed, so if you distribute a binary of
378       pcretest linked in this way, there may be licensing issues.
379
380       Setting  this  option  causes  the -lreadline option to be added to the
381       pcretest build. In many operating environments with  a  sytem-installed
382       libreadline this is sufficient. However, in some environments (e.g.  if
383       an unmodified distribution version of readline is in use),  some  extra
384       configuration  may  be necessary. The INSTALL file for libreadline says
385       this:
386
387         "Readline uses the termcap functions, but does not link with the
388         termcap or curses library itself, allowing applications which link
389         with readline the to choose an appropriate library."
390
391       If your environment has not been set up so that an appropriate  library
392       is automatically included, you may need to add something like
393
394         LIBS="-ncurses"
395
396       immediately before the configure command.
397

DEBUGGING WITH VALGRIND SUPPORT

399
400       By adding the
401
402         --enable-valgrind
403
404       option  to to the configure command, PCRE will use valgrind annotations
405       to mark certain memory regions as  unaddressable.  This  allows  it  to
406       detect invalid memory accesses, and is mostly useful for debugging PCRE
407       itself.
408

CODE COVERAGE REPORTING

410
411       If your C compiler is gcc, you can build a version  of  PCRE  that  can
412       generate a code coverage report for its test suite. To enable this, you
413       must install lcov version 1.6 or above. Then specify
414
415         --enable-coverage
416
417       to the configure command and build PCRE in the usual way.
418
419       Note that using ccache (a caching C compiler) is incompatible with code
420       coverage  reporting. If you have configured ccache to run automatically
421       on your system, you must set the environment variable
422
423         CCACHE_DISABLE=1
424
425       before running make to build PCRE, so that ccache is not used.
426
427       When --enable-coverage is used,  the  following  addition  targets  are
428       added to the Makefile:
429
430         make coverage
431
432       This  creates  a  fresh  coverage report for the PCRE test suite. It is
433       equivalent to running "make coverage-reset", "make  coverage-baseline",
434       "make check", and then "make coverage-report".
435
436         make coverage-reset
437
438       This zeroes the coverage counters, but does nothing else.
439
440         make coverage-baseline
441
442       This captures baseline coverage information.
443
444         make coverage-report
445
446       This creates the coverage report.
447
448         make coverage-clean-report
449
450       This  removes the generated coverage report without cleaning the cover‐
451       age data itself.
452
453         make coverage-clean-data
454
455       This removes the captured coverage data without removing  the  coverage
456       files created at compile time (*.gcno).
457
458         make coverage-clean
459
460       This  cleans all coverage data including the generated coverage report.
461       For more information about code coverage, see the gcov and  lcov  docu‐
462       mentation.
463

SEE ALSO

465
466       pcreapi(3), pcre16, pcre32, pcre_config(3).
467

AUTHOR

469
470       Philip Hazel
471       University Computing Service
472       Cambridge CB2 3QH, England.
473

REVISION

475
476       Last updated: 12 May 2013
477       Copyright (c) 1997-2013 University of Cambridge.
478
479
480
481PCRE 8.33                         12 May 2013                     PCREBUILD(3)
Impressum