1PCREBUILD(3) Library Functions Manual PCREBUILD(3)
2
3
4
6 PCRE - Perl-compatible regular expressions
7
9
10 PCRE is distributed with a configure script that can be used to build
11 the library in Unix-like environments using the applications known as
12 Autotools. Also in the distribution are files to support building
13 using CMake instead of configure. The text file README contains general
14 information about building with Autotools (some of which is repeated
15 below), and also has some comments about building on various operating
16 systems. There is a lot more information about building PCRE without
17 using Autotools (including information about using CMake and building
18 "by hand") in the text file called NON-AUTOTOOLS-BUILD. You should
19 consult this file as well as the README file if you are building in a
20 non-Unix-like environment.
21
23
24 The rest of this document describes the optional features of PCRE that
25 can be selected when the library is compiled. It assumes use of the
26 configure script, where the optional features are selected or dese‐
27 lected by providing options to configure before running the make com‐
28 mand. However, the same options can be selected in both Unix-like and
29 non-Unix-like environments using the GUI facility of cmake-gui if you
30 are using CMake instead of configure to build PCRE.
31
32 If you are not using Autotools or CMake, option selection can be done
33 by editing the config.h file, or by passing parameter settings to the
34 compiler, as described in NON-AUTOTOOLS-BUILD.
35
36 The complete list of options for configure (which includes the standard
37 ones such as the selection of the installation directory) can be
38 obtained by running
39
40 ./configure --help
41
42 The following sections include descriptions of options whose names
43 begin with --enable or --disable. These settings specify changes to the
44 defaults for the configure command. Because of the way that configure
45 works, --enable and --disable always come in pairs, so the complemen‐
46 tary option always exists as well, but as it specifies the default, it
47 is not described.
48
50
51 By default, a library called libpcre is built, containing functions
52 that take string arguments contained in vectors of bytes, either as
53 single-byte characters, or interpreted as UTF-8 strings. You can also
54 build a separate library, called libpcre16, in which strings are con‐
55 tained in vectors of 16-bit data units and interpreted either as sin‐
56 gle-unit characters or UTF-16 strings, by adding
57
58 --enable-pcre16
59
60 to the configure command. You can also build yet another separate
61 library, called libpcre32, in which strings are contained in vectors of
62 32-bit data units and interpreted either as single-unit characters or
63 UTF-32 strings, by adding
64
65 --enable-pcre32
66
67 to the configure command. If you do not want the 8-bit library, add
68
69 --disable-pcre8
70
71 as well. At least one of the three libraries must be built. Note that
72 the C++ and POSIX wrappers are for the 8-bit library only, and that
73 pcregrep is an 8-bit program. None of these are built if you select
74 only the 16-bit or 32-bit libraries.
75
77
78 The Autotools PCRE building process uses libtool to build both shared
79 and static libraries by default. You can suppress one of these by
80 adding one of
81
82 --disable-shared
83 --disable-static
84
85 to the configure command, as required.
86
88
89 By default, if the 8-bit library is being built, the configure script
90 will search for a C++ compiler and C++ header files. If it finds them,
91 it automatically builds the C++ wrapper library (which supports only
92 8-bit strings). You can disable this by adding
93
94 --disable-cpp
95
96 to the configure command.
97
99
100 To build PCRE with support for UTF Unicode character strings, add
101
102 --enable-utf
103
104 to the configure command. This setting applies to all three libraries,
105 adding support for UTF-8 to the 8-bit library, support for UTF-16 to
106 the 16-bit library, and support for UTF-32 to the to the 32-bit
107 library. There are no separate options for enabling UTF-8, UTF-16 and
108 UTF-32 independently because that would allow ridiculous settings such
109 as requesting UTF-16 support while building only the 8-bit library. It
110 is not possible to build one library with UTF support and another with‐
111 out in the same configuration. (For backwards compatibility, --enable-
112 utf8 is a synonym of --enable-utf.)
113
114 Of itself, this setting does not make PCRE treat strings as UTF-8,
115 UTF-16 or UTF-32. As well as compiling PCRE with this option, you also
116 have have to set the PCRE_UTF8, PCRE_UTF16 or PCRE_UTF32 option (as
117 appropriate) when you call one of the pattern compiling functions.
118
119 If you set --enable-utf when compiling in an EBCDIC environment, PCRE
120 expects its input to be either ASCII or UTF-8 (depending on the run-
121 time option). It is not possible to support both EBCDIC and UTF-8 codes
122 in the same version of the library. Consequently, --enable-utf and
123 --enable-ebcdic are mutually exclusive.
124
126
127 UTF support allows the libraries to process character codepoints up to
128 0x10ffff in the strings that they handle. On its own, however, it does
129 not provide any facilities for accessing the properties of such charac‐
130 ters. If you want to be able to use the pattern escapes \P, \p, and \X,
131 which refer to Unicode character properties, you must add
132
133 --enable-unicode-properties
134
135 to the configure command. This implies UTF support, even if you have
136 not explicitly requested it.
137
138 Including Unicode property support adds around 30K of tables to the
139 PCRE library. Only the general category properties such as Lu and Nd
140 are supported. Details are given in the pcrepattern documentation.
141
143
144 Just-in-time compiler support is included in the build by specifying
145
146 --enable-jit
147
148 This support is available only for certain hardware architectures. If
149 this option is set for an unsupported architecture, a compile time
150 error occurs. See the pcrejit documentation for a discussion of JIT
151 usage. When JIT support is enabled, pcregrep automatically makes use of
152 it, unless you add
153
154 --disable-pcregrep-jit
155
156 to the "configure" command.
157
159
160 By default, PCRE interprets the linefeed (LF) character as indicating
161 the end of a line. This is the normal newline character on Unix-like
162 systems. You can compile PCRE to use carriage return (CR) instead, by
163 adding
164
165 --enable-newline-is-cr
166
167 to the configure command. There is also a --enable-newline-is-lf
168 option, which explicitly specifies linefeed as the newline character.
169
170 Alternatively, you can specify that line endings are to be indicated by
171 the two character sequence CRLF. If you want this, add
172
173 --enable-newline-is-crlf
174
175 to the configure command. There is a fourth option, specified by
176
177 --enable-newline-is-anycrlf
178
179 which causes PCRE to recognize any of the three sequences CR, LF, or
180 CRLF as indicating a line ending. Finally, a fifth option, specified by
181
182 --enable-newline-is-any
183
184 causes PCRE to recognize any Unicode newline sequence.
185
186 Whatever line ending convention is selected when PCRE is built can be
187 overridden when the library functions are called. At build time it is
188 conventional to use the standard for your operating system.
189
191
192 By default, the sequence \R in a pattern matches any Unicode newline
193 sequence, whatever has been selected as the line ending sequence. If
194 you specify
195
196 --enable-bsr-anycrlf
197
198 the default is changed so that \R matches only CR, LF, or CRLF. What‐
199 ever is selected when PCRE is built can be overridden when the library
200 functions are called.
201
203
204 When the 8-bit library is called through the POSIX interface (see the
205 pcreposix documentation), additional working storage is required for
206 holding the pointers to capturing substrings, because PCRE requires
207 three integers per substring, whereas the POSIX interface provides only
208 two. If the number of expected substrings is small, the wrapper func‐
209 tion uses space on the stack, because this is faster than using mal‐
210 loc() for each call. The default threshold above which the stack is no
211 longer used is 10; it can be changed by adding a setting such as
212
213 --with-posix-malloc-threshold=20
214
215 to the configure command.
216
218
219 Within a compiled pattern, offset values are used to point from one
220 part to another (for example, from an opening parenthesis to an alter‐
221 nation metacharacter). By default, in the 8-bit and 16-bit libraries,
222 two-byte values are used for these offsets, leading to a maximum size
223 for a compiled pattern of around 64K. This is sufficient to handle all
224 but the most gigantic patterns. Nevertheless, some people do want to
225 process truly enormous patterns, so it is possible to compile PCRE to
226 use three-byte or four-byte offsets by adding a setting such as
227
228 --with-link-size=3
229
230 to the configure command. The value given must be 2, 3, or 4. For the
231 16-bit library, a value of 3 is rounded up to 4. In these libraries,
232 using longer offsets slows down the operation of PCRE because it has to
233 load additional data when handling them. For the 32-bit library the
234 value is always 4 and cannot be overridden; the value of --with-link-
235 size is ignored.
236
238
239 When matching with the pcre_exec() function, PCRE implements backtrack‐
240 ing by making recursive calls to an internal function called match().
241 In environments where the size of the stack is limited, this can se‐
242 verely limit PCRE's operation. (The Unix environment does not usually
243 suffer from this problem, but it may sometimes be necessary to increase
244 the maximum stack size. There is a discussion in the pcrestack docu‐
245 mentation.) An alternative approach to recursion that uses memory from
246 the heap to remember data, instead of using recursive function calls,
247 has been implemented to work round the problem of limited stack size.
248 If you want to build a version of PCRE that works this way, add
249
250 --disable-stack-for-recursion
251
252 to the configure command. With this configuration, PCRE will use the
253 pcre_stack_malloc and pcre_stack_free variables to call memory manage‐
254 ment functions. By default these point to malloc() and free(), but you
255 can replace the pointers so that your own functions are used instead.
256
257 Separate functions are provided rather than using pcre_malloc and
258 pcre_free because the usage is very predictable: the block sizes
259 requested are always the same, and the blocks are always freed in
260 reverse order. A calling program might be able to implement optimized
261 functions that perform better than malloc() and free(). PCRE runs
262 noticeably more slowly when built in this way. This option affects only
263 the pcre_exec() function; it is not relevant for pcre_dfa_exec().
264
266
267 Internally, PCRE has a function called match(), which it calls repeat‐
268 edly (sometimes recursively) when matching a pattern with the
269 pcre_exec() function. By controlling the maximum number of times this
270 function may be called during a single matching operation, a limit can
271 be placed on the resources used by a single call to pcre_exec(). The
272 limit can be changed at run time, as described in the pcreapi documen‐
273 tation. The default is 10 million, but this can be changed by adding a
274 setting such as
275
276 --with-match-limit=500000
277
278 to the configure command. This setting has no effect on the
279 pcre_dfa_exec() matching function.
280
281 In some environments it is desirable to limit the depth of recursive
282 calls of match() more strictly than the total number of calls, in order
283 to restrict the maximum amount of stack (or heap, if --disable-stack-
284 for-recursion is specified) that is used. A second limit controls this;
285 it defaults to the value that is set for --with-match-limit, which
286 imposes no additional constraints. However, you can set a lower limit
287 by adding, for example,
288
289 --with-match-limit-recursion=10000
290
291 to the configure command. This value can also be overridden at run
292 time.
293
295
296 PCRE uses fixed tables for processing characters whose code values are
297 less than 256. By default, PCRE is built with a set of tables that are
298 distributed in the file pcre_chartables.c.dist. These tables are for
299 ASCII codes only. If you add
300
301 --enable-rebuild-chartables
302
303 to the configure command, the distributed tables are no longer used.
304 Instead, a program called dftables is compiled and run. This outputs
305 the source for new set of tables, created in the default locale of your
306 C run-time system. (This method of replacing the tables does not work
307 if you are cross compiling, because dftables is run on the local host.
308 If you need to create alternative tables when cross compiling, you will
309 have to do so "by hand".)
310
312
313 PCRE assumes by default that it will run in an environment where the
314 character code is ASCII (or Unicode, which is a superset of ASCII).
315 This is the case for most computer operating systems. PCRE can, how‐
316 ever, be compiled to run in an EBCDIC environment by adding
317
318 --enable-ebcdic
319
320 to the configure command. This setting implies --enable-rebuild-charta‐
321 bles. You should only use it if you know that you are in an EBCDIC
322 environment (for example, an IBM mainframe operating system). The
323 --enable-ebcdic option is incompatible with --enable-utf.
324
325 The EBCDIC character that corresponds to an ASCII LF is assumed to have
326 the value 0x15 by default. However, in some EBCDIC environments, 0x25
327 is used. In such an environment you should use
328
329 --enable-ebcdic-nl25
330
331 as well as, or instead of, --enable-ebcdic. The EBCDIC character for CR
332 has the same value as in ASCII, namely, 0x0d. Whichever of 0x15 and
333 0x25 is not chosen as LF is made to correspond to the Unicode NEL char‐
334 acter (which, in Unicode, is 0x85).
335
336 The options that select newline behaviour, such as --enable-newline-is-
337 cr, and equivalent run-time options, refer to these character values in
338 an EBCDIC environment.
339
341
342 By default, pcregrep reads all files as plain text. You can build it so
343 that it recognizes files whose names end in .gz or .bz2, and reads them
344 with libz or libbz2, respectively, by adding one or both of
345
346 --enable-pcregrep-libz
347 --enable-pcregrep-libbz2
348
349 to the configure command. These options naturally require that the rel‐
350 evant libraries are installed on your system. Configuration will fail
351 if they are not.
352
354
355 pcregrep uses an internal buffer to hold a "window" on the file it is
356 scanning, in order to be able to output "before" and "after" lines when
357 it finds a match. The size of the buffer is controlled by a parameter
358 whose default value is 20K. The buffer itself is three times this size,
359 but because of the way it is used for holding "before" lines, the long‐
360 est line that is guaranteed to be processable is the parameter size.
361 You can change the default parameter value by adding, for example,
362
363 --with-pcregrep-bufsize=50K
364
365 to the configure command. The caller of pcregrep can, however, override
366 this value by specifying a run-time option.
367
369
370 If you add
371
372 --enable-pcretest-libreadline
373
374 to the configure command, pcretest is linked with the libreadline
375 library, and when its input is from a terminal, it reads it using the
376 readline() function. This provides line-editing and history facilities.
377 Note that libreadline is GPL-licensed, so if you distribute a binary of
378 pcretest linked in this way, there may be licensing issues.
379
380 Setting this option causes the -lreadline option to be added to the
381 pcretest build. In many operating environments with a sytem-installed
382 libreadline this is sufficient. However, in some environments (e.g. if
383 an unmodified distribution version of readline is in use), some extra
384 configuration may be necessary. The INSTALL file for libreadline says
385 this:
386
387 "Readline uses the termcap functions, but does not link with the
388 termcap or curses library itself, allowing applications which link
389 with readline the to choose an appropriate library."
390
391 If your environment has not been set up so that an appropriate library
392 is automatically included, you may need to add something like
393
394 LIBS="-ncurses"
395
396 immediately before the configure command.
397
399
400 By adding the
401
402 --enable-valgrind
403
404 option to to the configure command, PCRE will use valgrind annotations
405 to mark certain memory regions as unaddressable. This allows it to
406 detect invalid memory accesses, and is mostly useful for debugging PCRE
407 itself.
408
410
411 If your C compiler is gcc, you can build a version of PCRE that can
412 generate a code coverage report for its test suite. To enable this, you
413 must install lcov version 1.6 or above. Then specify
414
415 --enable-coverage
416
417 to the configure command and build PCRE in the usual way.
418
419 Note that using ccache (a caching C compiler) is incompatible with code
420 coverage reporting. If you have configured ccache to run automatically
421 on your system, you must set the environment variable
422
423 CCACHE_DISABLE=1
424
425 before running make to build PCRE, so that ccache is not used.
426
427 When --enable-coverage is used, the following addition targets are
428 added to the Makefile:
429
430 make coverage
431
432 This creates a fresh coverage report for the PCRE test suite. It is
433 equivalent to running "make coverage-reset", "make coverage-baseline",
434 "make check", and then "make coverage-report".
435
436 make coverage-reset
437
438 This zeroes the coverage counters, but does nothing else.
439
440 make coverage-baseline
441
442 This captures baseline coverage information.
443
444 make coverage-report
445
446 This creates the coverage report.
447
448 make coverage-clean-report
449
450 This removes the generated coverage report without cleaning the cover‐
451 age data itself.
452
453 make coverage-clean-data
454
455 This removes the captured coverage data without removing the coverage
456 files created at compile time (*.gcno).
457
458 make coverage-clean
459
460 This cleans all coverage data including the generated coverage report.
461 For more information about code coverage, see the gcov and lcov docu‐
462 mentation.
463
465
466 pcreapi(3), pcre16, pcre32, pcre_config(3).
467
469
470 Philip Hazel
471 University Computing Service
472 Cambridge CB2 3QH, England.
473
475
476 Last updated: 12 May 2013
477 Copyright (c) 1997-2013 University of Cambridge.
478
479
480
481PCRE 8.33 12 May 2013 PCREBUILD(3)