1PCREBUILD(3) Library Functions Manual PCREBUILD(3)
2
3
4
6 PCRE - Perl-compatible regular expressions
7
9
10 This document describes the optional features of PCRE that can be
11 selected when the library is compiled. It assumes use of the configure
12 script, where the optional features are selected or deselected by pro‐
13 viding options to configure before running the make command. However,
14 the same options can be selected in both Unix-like and non-Unix-like
15 environments using the GUI facility of cmake-gui if you are using CMake
16 instead of configure to build PCRE.
17
18 There is a lot more information about building PCRE without using con‐
19 figure (including information about using CMake or building "by hand")
20 in the file called NON-AUTOTOOLS-BUILD, which is part of the PCRE dis‐
21 tribution. You should consult this file as well as the README file if
22 you are building in a non-Unix-like environment.
23
24 The complete list of options for configure (which includes the standard
25 ones such as the selection of the installation directory) can be
26 obtained by running
27
28 ./configure --help
29
30 The following sections include descriptions of options whose names
31 begin with --enable or --disable. These settings specify changes to the
32 defaults for the configure command. Because of the way that configure
33 works, --enable and --disable always come in pairs, so the complemen‐
34 tary option always exists as well, but as it specifies the default, it
35 is not described.
36
38
39 By default, a library called libpcre is built, containing functions
40 that take string arguments contained in vectors of bytes, either as
41 single-byte characters, or interpreted as UTF-8 strings. You can also
42 build a separate library, called libpcre16, in which strings are con‐
43 tained in vectors of 16-bit data units and interpreted either as sin‐
44 gle-unit characters or UTF-16 strings, by adding
45
46 --enable-pcre16
47
48 to the configure command. You can also build a separate library, called
49 libpcre32, in which strings are contained in vectors of 32-bit data
50 units and interpreted either as single-unit characters or UTF-32
51 strings, by adding
52
53 --enable-pcre32
54
55 to the configure command. If you do not want the 8-bit library, add
56
57 --disable-pcre8
58
59 as well. At least one of the three libraries must be built. Note that
60 the C++ and POSIX wrappers are for the 8-bit library only, and that
61 pcregrep is an 8-bit program. None of these are built if you select
62 only the 16-bit or 32-bit libraries.
63
65
66 The PCRE building process uses libtool to build both shared and static
67 Unix libraries by default. You can suppress one of these by adding one
68 of
69
70 --disable-shared
71 --disable-static
72
73 to the configure command, as required.
74
76
77 By default, if the 8-bit library is being built, the configure script
78 will search for a C++ compiler and C++ header files. If it finds them,
79 it automatically builds the C++ wrapper library (which supports only
80 8-bit strings). You can disable this by adding
81
82 --disable-cpp
83
84 to the configure command.
85
87
88 To build PCRE with support for UTF Unicode character strings, add
89
90 --enable-utf
91
92 to the configure command. This setting applies to all three libraries,
93 adding support for UTF-8 to the 8-bit library, support for UTF-16 to
94 the 16-bit library, and support for UTF-32 to the to the 32-bit
95 library. There are no separate options for enabling UTF-8, UTF-16 and
96 UTF-32 independently because that would allow ridiculous settings such
97 as requesting UTF-16 support while building only the 8-bit library. It
98 is not possible to build one library with UTF support and another with‐
99 out in the same configuration. (For backwards compatibility, --enable-
100 utf8 is a synonym of --enable-utf.)
101
102 Of itself, this setting does not make PCRE treat strings as UTF-8,
103 UTF-16 or UTF-32. As well as compiling PCRE with this option, you also
104 have have to set the PCRE_UTF8, PCRE_UTF16 or PCRE_UTF32 option (as
105 appropriate) when you call one of the pattern compiling functions.
106
107 If you set --enable-utf when compiling in an EBCDIC environment, PCRE
108 expects its input to be either ASCII or UTF-8 (depending on the run-
109 time option). It is not possible to support both EBCDIC and UTF-8 codes
110 in the same version of the library. Consequently, --enable-utf and
111 --enable-ebcdic are mutually exclusive.
112
114
115 UTF support allows the libraries to process character codepoints up to
116 0x10ffff in the strings that they handle. On its own, however, it does
117 not provide any facilities for accessing the properties of such charac‐
118 ters. If you want to be able to use the pattern escapes \P, \p, and \X,
119 which refer to Unicode character properties, you must add
120
121 --enable-unicode-properties
122
123 to the configure command. This implies UTF support, even if you have
124 not explicitly requested it.
125
126 Including Unicode property support adds around 30K of tables to the
127 PCRE library. Only the general category properties such as Lu and Nd
128 are supported. Details are given in the pcrepattern documentation.
129
131
132 Just-in-time compiler support is included in the build by specifying
133
134 --enable-jit
135
136 This support is available only for certain hardware architectures. If
137 this option is set for an unsupported architecture, a compile time
138 error occurs. See the pcrejit documentation for a discussion of JIT
139 usage. When JIT support is enabled, pcregrep automatically makes use of
140 it, unless you add
141
142 --disable-pcregrep-jit
143
144 to the "configure" command.
145
147
148 By default, PCRE interprets the linefeed (LF) character as indicating
149 the end of a line. This is the normal newline character on Unix-like
150 systems. You can compile PCRE to use carriage return (CR) instead, by
151 adding
152
153 --enable-newline-is-cr
154
155 to the configure command. There is also a --enable-newline-is-lf
156 option, which explicitly specifies linefeed as the newline character.
157
158 Alternatively, you can specify that line endings are to be indicated by
159 the two character sequence CRLF. If you want this, add
160
161 --enable-newline-is-crlf
162
163 to the configure command. There is a fourth option, specified by
164
165 --enable-newline-is-anycrlf
166
167 which causes PCRE to recognize any of the three sequences CR, LF, or
168 CRLF as indicating a line ending. Finally, a fifth option, specified by
169
170 --enable-newline-is-any
171
172 causes PCRE to recognize any Unicode newline sequence.
173
174 Whatever line ending convention is selected when PCRE is built can be
175 overridden when the library functions are called. At build time it is
176 conventional to use the standard for your operating system.
177
179
180 By default, the sequence \R in a pattern matches any Unicode newline
181 sequence, whatever has been selected as the line ending sequence. If
182 you specify
183
184 --enable-bsr-anycrlf
185
186 the default is changed so that \R matches only CR, LF, or CRLF. What‐
187 ever is selected when PCRE is built can be overridden when the library
188 functions are called.
189
191
192 When the 8-bit library is called through the POSIX interface (see the
193 pcreposix documentation), additional working storage is required for
194 holding the pointers to capturing substrings, because PCRE requires
195 three integers per substring, whereas the POSIX interface provides only
196 two. If the number of expected substrings is small, the wrapper func‐
197 tion uses space on the stack, because this is faster than using mal‐
198 loc() for each call. The default threshold above which the stack is no
199 longer used is 10; it can be changed by adding a setting such as
200
201 --with-posix-malloc-threshold=20
202
203 to the configure command.
204
206
207 Within a compiled pattern, offset values are used to point from one
208 part to another (for example, from an opening parenthesis to an alter‐
209 nation metacharacter). By default, in the 8-bit and 16-bit libraries,
210 two-byte values are used for these offsets, leading to a maximum size
211 for a compiled pattern of around 64K. This is sufficient to handle all
212 but the most gigantic patterns. Nevertheless, some people do want to
213 process truly enormous patterns, so it is possible to compile PCRE to
214 use three-byte or four-byte offsets by adding a setting such as
215
216 --with-link-size=3
217
218 to the configure command. The value given must be 2, 3, or 4. For the
219 16-bit library, a value of 3 is rounded up to 4. In these libraries,
220 using longer offsets slows down the operation of PCRE because it has to
221 load additional data when handling them. For the 32-bit library the
222 value is always 4 and cannot be overridden; the value of --with-link-
223 size is ignored.
224
226
227 When matching with the pcre_exec() function, PCRE implements backtrack‐
228 ing by making recursive calls to an internal function called match().
229 In environments where the size of the stack is limited, this can se‐
230 verely limit PCRE's operation. (The Unix environment does not usually
231 suffer from this problem, but it may sometimes be necessary to increase
232 the maximum stack size. There is a discussion in the pcrestack docu‐
233 mentation.) An alternative approach to recursion that uses memory from
234 the heap to remember data, instead of using recursive function calls,
235 has been implemented to work round the problem of limited stack size.
236 If you want to build a version of PCRE that works this way, add
237
238 --disable-stack-for-recursion
239
240 to the configure command. With this configuration, PCRE will use the
241 pcre_stack_malloc and pcre_stack_free variables to call memory manage‐
242 ment functions. By default these point to malloc() and free(), but you
243 can replace the pointers so that your own functions are used instead.
244
245 Separate functions are provided rather than using pcre_malloc and
246 pcre_free because the usage is very predictable: the block sizes
247 requested are always the same, and the blocks are always freed in
248 reverse order. A calling program might be able to implement optimized
249 functions that perform better than malloc() and free(). PCRE runs
250 noticeably more slowly when built in this way. This option affects only
251 the pcre_exec() function; it is not relevant for pcre_dfa_exec().
252
254
255 Internally, PCRE has a function called match(), which it calls repeat‐
256 edly (sometimes recursively) when matching a pattern with the
257 pcre_exec() function. By controlling the maximum number of times this
258 function may be called during a single matching operation, a limit can
259 be placed on the resources used by a single call to pcre_exec(). The
260 limit can be changed at run time, as described in the pcreapi documen‐
261 tation. The default is 10 million, but this can be changed by adding a
262 setting such as
263
264 --with-match-limit=500000
265
266 to the configure command. This setting has no effect on the
267 pcre_dfa_exec() matching function.
268
269 In some environments it is desirable to limit the depth of recursive
270 calls of match() more strictly than the total number of calls, in order
271 to restrict the maximum amount of stack (or heap, if --disable-stack-
272 for-recursion is specified) that is used. A second limit controls this;
273 it defaults to the value that is set for --with-match-limit, which
274 imposes no additional constraints. However, you can set a lower limit
275 by adding, for example,
276
277 --with-match-limit-recursion=10000
278
279 to the configure command. This value can also be overridden at run
280 time.
281
283
284 PCRE uses fixed tables for processing characters whose code values are
285 less than 256. By default, PCRE is built with a set of tables that are
286 distributed in the file pcre_chartables.c.dist. These tables are for
287 ASCII codes only. If you add
288
289 --enable-rebuild-chartables
290
291 to the configure command, the distributed tables are no longer used.
292 Instead, a program called dftables is compiled and run. This outputs
293 the source for new set of tables, created in the default locale of your
294 C run-time system. (This method of replacing the tables does not work
295 if you are cross compiling, because dftables is run on the local host.
296 If you need to create alternative tables when cross compiling, you will
297 have to do so "by hand".)
298
300
301 PCRE assumes by default that it will run in an environment where the
302 character code is ASCII (or Unicode, which is a superset of ASCII).
303 This is the case for most computer operating systems. PCRE can, how‐
304 ever, be compiled to run in an EBCDIC environment by adding
305
306 --enable-ebcdic
307
308 to the configure command. This setting implies --enable-rebuild-charta‐
309 bles. You should only use it if you know that you are in an EBCDIC
310 environment (for example, an IBM mainframe operating system). The
311 --enable-ebcdic option is incompatible with --enable-utf.
312
313 The EBCDIC character that corresponds to an ASCII LF is assumed to have
314 the value 0x15 by default. However, in some EBCDIC environments, 0x25
315 is used. In such an environment you should use
316
317 --enable-ebcdic-nl25
318
319 as well as, or instead of, --enable-ebcdic. The EBCDIC character for CR
320 has the same value as in ASCII, namely, 0x0d. Whichever of 0x15 and
321 0x25 is not chosen as LF is made to correspond to the Unicode NEL char‐
322 acter (which, in Unicode, is 0x85).
323
324 The options that select newline behaviour, such as --enable-newline-is-
325 cr, and equivalent run-time options, refer to these character values in
326 an EBCDIC environment.
327
329
330 By default, pcregrep reads all files as plain text. You can build it so
331 that it recognizes files whose names end in .gz or .bz2, and reads them
332 with libz or libbz2, respectively, by adding one or both of
333
334 --enable-pcregrep-libz
335 --enable-pcregrep-libbz2
336
337 to the configure command. These options naturally require that the rel‐
338 evant libraries are installed on your system. Configuration will fail
339 if they are not.
340
342
343 pcregrep uses an internal buffer to hold a "window" on the file it is
344 scanning, in order to be able to output "before" and "after" lines when
345 it finds a match. The size of the buffer is controlled by a parameter
346 whose default value is 20K. The buffer itself is three times this size,
347 but because of the way it is used for holding "before" lines, the long‐
348 est line that is guaranteed to be processable is the parameter size.
349 You can change the default parameter value by adding, for example,
350
351 --with-pcregrep-bufsize=50K
352
353 to the configure command. The caller of pcregrep can, however, override
354 this value by specifying a run-time option.
355
357
358 If you add
359
360 --enable-pcretest-libreadline
361
362 to the configure command, pcretest is linked with the libreadline
363 library, and when its input is from a terminal, it reads it using the
364 readline() function. This provides line-editing and history facilities.
365 Note that libreadline is GPL-licensed, so if you distribute a binary of
366 pcretest linked in this way, there may be licensing issues.
367
368 Setting this option causes the -lreadline option to be added to the
369 pcretest build. In many operating environments with a sytem-installed
370 libreadline this is sufficient. However, in some environments (e.g. if
371 an unmodified distribution version of readline is in use), some extra
372 configuration may be necessary. The INSTALL file for libreadline says
373 this:
374
375 "Readline uses the termcap functions, but does not link with the
376 termcap or curses library itself, allowing applications which link
377 with readline the to choose an appropriate library."
378
379 If your environment has not been set up so that an appropriate library
380 is automatically included, you may need to add something like
381
382 LIBS="-ncurses"
383
384 immediately before the configure command.
385
387
388 By adding the
389
390 --enable-valgrind
391
392 option to to the configure command, PCRE will use valgrind annotations
393 to mark certain memory regions as unaddressable. This allows it to
394 detect invalid memory accesses, and is mostly useful for debugging PCRE
395 itself.
396
398
399 If your C compiler is gcc, you can build a version of PCRE that can
400 generate a code coverage report for its test suite. To enable this, you
401 must install lcov version 1.6 or above. Then specify
402
403 --enable-coverage
404
405 to the configure command and build PCRE in the usual way.
406
407 Note that using ccache (a caching C compiler) is incompatible with code
408 coverage reporting. If you have configured ccache to run automatically
409 on your system, you must set the environment variable
410
411 CCACHE_DISABLE=1
412
413 before running make to build PCRE, so that ccache is not used.
414
415 When --enable-coverage is used, the following addition targets are
416 added to the Makefile:
417
418 make coverage
419
420 This creates a fresh coverage report for the PCRE test suite. It is
421 equivalent to running "make coverage-reset", "make coverage-baseline",
422 "make check", and then "make coverage-report".
423
424 make coverage-reset
425
426 This zeroes the coverage counters, but does nothing else.
427
428 make coverage-baseline
429
430 This captures baseline coverage information.
431
432 make coverage-report
433
434 This creates the coverage report.
435
436 make coverage-clean-report
437
438 This removes the generated coverage report without cleaning the cover‐
439 age data itself.
440
441 make coverage-clean-data
442
443 This removes the captured coverage data without removing the coverage
444 files created at compile time (*.gcno).
445
446 make coverage-clean
447
448 This cleans all coverage data including the generated coverage report.
449 For more information about code coverage, see the gcov and lcov docu‐
450 mentation.
451
453
454 pcreapi(3), pcre16, pcre32, pcre_config(3).
455
457
458 Philip Hazel
459 University Computing Service
460 Cambridge CB2 3QH, England.
461
463
464 Last updated: 30 October 2012
465 Copyright (c) 1997-2012 University of Cambridge.
466
467
468
469PCRE 8.32 30 October 2012 PCREBUILD(3)