1PCRE2BUILD(3) Library Functions Manual PCRE2BUILD(3)
2
3
4
6 PCRE2 - Perl-compatible regular expressions (revised API)
7
9
10 PCRE2 is distributed with a configure script that can be used to build
11 the library in Unix-like environments using the applications known as
12 Autotools. Also in the distribution are files to support building using
13 CMake instead of configure. The text file README contains general
14 information about building with Autotools (some of which is repeated
15 below), and also has some comments about building on various operating
16 systems. There is a lot more information about building PCRE2 without
17 using Autotools (including information about using CMake and building
18 "by hand") in the text file called NON-AUTOTOOLS-BUILD. You should
19 consult this file as well as the README file if you are building in a
20 non-Unix-like environment.
21
23
24 The rest of this document describes the optional features of PCRE2 that
25 can be selected when the library is compiled. It assumes use of the
26 configure script, where the optional features are selected or dese‐
27 lected by providing options to configure before running the make com‐
28 mand. However, the same options can be selected in both Unix-like and
29 non-Unix-like environments if you are using CMake instead of configure
30 to build PCRE2.
31
32 If you are not using Autotools or CMake, option selection can be done
33 by editing the config.h file, or by passing parameter settings to the
34 compiler, as described in NON-AUTOTOOLS-BUILD.
35
36 The complete list of options for configure (which includes the standard
37 ones such as the selection of the installation directory) can be
38 obtained by running
39
40 ./configure --help
41
42 The following sections include descriptions of options whose names
43 begin with --enable or --disable. These settings specify changes to the
44 defaults for the configure command. Because of the way that configure
45 works, --enable and --disable always come in pairs, so the complemen‐
46 tary option always exists as well, but as it specifies the default, it
47 is not described.
48
50
51 By default, a library called libpcre2-8 is built, containing functions
52 that take string arguments contained in vectors of bytes, interpreted
53 either as single-byte characters, or UTF-8 strings. You can also build
54 two other libraries, called libpcre2-16 and libpcre2-32, which process
55 strings that are contained in vectors of 16-bit and 32-bit code units,
56 respectively. These can be interpreted either as single-unit characters
57 or UTF-16/UTF-32 strings. To build these additional libraries, add one
58 or both of the following to the configure command:
59
60 --enable-pcre2-16
61 --enable-pcre2-32
62
63 If you do not want the 8-bit library, add
64
65 --disable-pcre2-8
66
67 as well. At least one of the three libraries must be built. Note that
68 the POSIX wrapper is for the 8-bit library only, and that pcre2grep is
69 an 8-bit program. Neither of these are built if you select only the
70 16-bit or 32-bit libraries.
71
73
74 The Autotools PCRE2 building process uses libtool to build both shared
75 and static libraries by default. You can suppress an unwanted library
76 by adding one of
77
78 --disable-shared
79 --disable-static
80
81 to the configure command.
82
84
85 By default, PCRE2 is built with support for Unicode and UTF character
86 strings. To build it without Unicode support, add
87
88 --disable-unicode
89
90 to the configure command. This setting applies to all three libraries.
91 It is not possible to build one library with Unicode support, and
92 another without, in the same configuration.
93
94 Of itself, Unicode support does not make PCRE2 treat strings as UTF-8,
95 UTF-16 or UTF-32. To do that, applications that use the library can set
96 the PCRE2_UTF option when they call pcre2_compile() to compile a pat‐
97 tern. Alternatively, patterns may be started with (*UTF) unless the
98 application has locked this out by setting PCRE2_NEVER_UTF.
99
100 UTF support allows the libraries to process character code points up to
101 0x10ffff in the strings that they handle. It also provides support for
102 accessing the Unicode properties of such characters, using pattern
103 escapes such as \P, \p, and \X. Only the general category properties
104 such as Lu and Nd are supported. Details are given in the pcre2pattern
105 documentation.
106
107 Pattern escapes such as \d and \w do not by default make use of Unicode
108 properties. The application can request that they do by setting the
109 PCRE2_UCP option. Unless the application has set PCRE2_NEVER_UCP, a
110 pattern may also request this by starting with (*UCP).
111
113
114 The \C escape sequence, which matches a single code unit, even in a UTF
115 mode, can cause unpredictable behaviour because it may leave the cur‐
116 rent matching point in the middle of a multi-code-unit character. The
117 application can lock it out by setting the PCRE2_NEVER_BACKSLASH_C
118 option when calling pcre2_compile(). There is also a build-time option
119
120 --enable-never-backslash-C
121
122 (note the upper case C) which locks out the use of \C entirely.
123
125
126 Just-in-time compiler support is included in the build by specifying
127
128 --enable-jit
129
130 This support is available only for certain hardware architectures. If
131 this option is set for an unsupported architecture, a building error
132 occurs. See the pcre2jit documentation for a discussion of JIT usage.
133 When JIT support is enabled, pcre2grep automatically makes use of it,
134 unless you add
135
136 --disable-pcre2grep-jit
137
138 to the "configure" command.
139
141
142 By default, PCRE2 interprets the linefeed (LF) character as indicating
143 the end of a line. This is the normal newline character on Unix-like
144 systems. You can compile PCRE2 to use carriage return (CR) instead, by
145 adding
146
147 --enable-newline-is-cr
148
149 to the configure command. There is also an --enable-newline-is-lf
150 option, which explicitly specifies linefeed as the newline character.
151
152 Alternatively, you can specify that line endings are to be indicated by
153 the two-character sequence CRLF (CR immediately followed by LF). If you
154 want this, add
155
156 --enable-newline-is-crlf
157
158 to the configure command. There is a fourth option, specified by
159
160 --enable-newline-is-anycrlf
161
162 which causes PCRE2 to recognize any of the three sequences CR, LF, or
163 CRLF as indicating a line ending. Finally, a fifth option, specified by
164
165 --enable-newline-is-any
166
167 causes PCRE2 to recognize any Unicode newline sequence. The Unicode
168 newline sequences are the three just mentioned, plus the single charac‐
169 ters VT (vertical tab, U+000B), FF (form feed, U+000C), NEL (next line,
170 U+0085), LS (line separator, U+2028), and PS (paragraph separator,
171 U+2029).
172
173 Whatever default line ending convention is selected when PCRE2 is built
174 can be overridden by applications that use the library. At build time
175 it is conventional to use the standard for your operating system.
176
178
179 By default, the sequence \R in a pattern matches any Unicode newline
180 sequence, independently of what has been selected as the line ending
181 sequence. If you specify
182
183 --enable-bsr-anycrlf
184
185 the default is changed so that \R matches only CR, LF, or CRLF. What‐
186 ever is selected when PCRE2 is built can be overridden by applications
187 that use the called.
188
190
191 Within a compiled pattern, offset values are used to point from one
192 part to another (for example, from an opening parenthesis to an alter‐
193 nation metacharacter). By default, in the 8-bit and 16-bit libraries,
194 two-byte values are used for these offsets, leading to a maximum size
195 for a compiled pattern of around 64K code units. This is sufficient to
196 handle all but the most gigantic patterns. Nevertheless, some people do
197 want to process truly enormous patterns, so it is possible to compile
198 PCRE2 to use three-byte or four-byte offsets by adding a setting such
199 as
200
201 --with-link-size=3
202
203 to the configure command. The value given must be 2, 3, or 4. For the
204 16-bit library, a value of 3 is rounded up to 4. In these libraries,
205 using longer offsets slows down the operation of PCRE2 because it has
206 to load additional data when handling them. For the 32-bit library the
207 value is always 4 and cannot be overridden; the value of --with-link-
208 size is ignored.
209
211
212 When matching with the pcre2_match() function, PCRE2 implements back‐
213 tracking by making recursive calls to an internal function called
214 match(). In environments where the size of the stack is limited, this
215 can severely limit PCRE2's operation. (The Unix environment does not
216 usually suffer from this problem, but it may sometimes be necessary to
217 increase the maximum stack size. There is a discussion in the
218 pcre2stack documentation.) An alternative approach to recursion that
219 uses memory from the heap to remember data, instead of using recursive
220 function calls, has been implemented to work round the problem of lim‐
221 ited stack size. If you want to build a version of PCRE2 that works
222 this way, add
223
224 --disable-stack-for-recursion
225
226 to the configure command. By default, the system functions malloc() and
227 free() are called to manage the heap memory that is required, but cus‐
228 tom memory management functions can be called instead. PCRE2 runs
229 noticeably more slowly when built in this way. This option affects only
230 the pcre2_match() function; it is not relevant for pcre2_dfa_match().
231
233
234 Internally, PCRE2 has a function called match(), which it calls repeat‐
235 edly (sometimes recursively) when matching a pattern with the
236 pcre2_match() function. By controlling the maximum number of times this
237 function may be called during a single matching operation, a limit can
238 be placed on the resources used by a single call to pcre2_match(). The
239 limit can be changed at run time, as described in the pcre2api documen‐
240 tation. The default is 10 million, but this can be changed by adding a
241 setting such as
242
243 --with-match-limit=500000
244
245 to the configure command. This setting has no effect on the
246 pcre2_dfa_match() matching function.
247
248 In some environments it is desirable to limit the depth of recursive
249 calls of match() more strictly than the total number of calls, in order
250 to restrict the maximum amount of stack (or heap, if --disable-stack-
251 for-recursion is specified) that is used. A second limit controls this;
252 it defaults to the value that is set for --with-match-limit, which
253 imposes no additional constraints. However, you can set a lower limit
254 by adding, for example,
255
256 --with-match-limit-recursion=10000
257
258 to the configure command. This value can also be overridden at run
259 time.
260
262
263 PCRE2 uses fixed tables for processing characters whose code points are
264 less than 256. By default, PCRE2 is built with a set of tables that are
265 distributed in the file src/pcre2_chartables.c.dist. These tables are
266 for ASCII codes only. If you add
267
268 --enable-rebuild-chartables
269
270 to the configure command, the distributed tables are no longer used.
271 Instead, a program called dftables is compiled and run. This outputs
272 the source for new set of tables, created in the default locale of your
273 C run-time system. (This method of replacing the tables does not work
274 if you are cross compiling, because dftables is run on the local host.
275 If you need to create alternative tables when cross compiling, you will
276 have to do so "by hand".)
277
279
280 PCRE2 assumes by default that it will run in an environment where the
281 character code is ASCII or Unicode, which is a superset of ASCII. This
282 is the case for most computer operating systems. PCRE2 can, however, be
283 compiled to run in an 8-bit EBCDIC environment by adding
284
285 --enable-ebcdic --disable-unicode
286
287 to the configure command. This setting implies --enable-rebuild-charta‐
288 bles. You should only use it if you know that you are in an EBCDIC
289 environment (for example, an IBM mainframe operating system).
290
291 It is not possible to support both EBCDIC and UTF-8 codes in the same
292 version of the library. Consequently, --enable-unicode and --enable-
293 ebcdic are mutually exclusive.
294
295 The EBCDIC character that corresponds to an ASCII LF is assumed to have
296 the value 0x15 by default. However, in some EBCDIC environments, 0x25
297 is used. In such an environment you should use
298
299 --enable-ebcdic-nl25
300
301 as well as, or instead of, --enable-ebcdic. The EBCDIC character for CR
302 has the same value as in ASCII, namely, 0x0d. Whichever of 0x15 and
303 0x25 is not chosen as LF is made to correspond to the Unicode NEL char‐
304 acter (which, in Unicode, is 0x85).
305
306 The options that select newline behaviour, such as --enable-newline-is-
307 cr, and equivalent run-time options, refer to these character values in
308 an EBCDIC environment.
309
311
312 By default, on non-Windows systems, pcre2grep supports the use of call‐
313 outs with string arguments within the patterns it is matching, in order
314 to run external scripts. For details, see the pcre2grep documentation.
315 This support can be disabled by adding --disable-pcre2grep-callout to
316 the configure command.
317
319
320 By default, pcre2grep reads all files as plain text. You can build it
321 so that it recognizes files whose names end in .gz or .bz2, and reads
322 them with libz or libbz2, respectively, by adding one or both of
323
324 --enable-pcre2grep-libz
325 --enable-pcre2grep-libbz2
326
327 to the configure command. These options naturally require that the rel‐
328 evant libraries are installed on your system. Configuration will fail
329 if they are not.
330
332
333 pcre2grep uses an internal buffer to hold a "window" on the file it is
334 scanning, in order to be able to output "before" and "after" lines when
335 it finds a match. The starting size of the buffer is controlled by a
336 parameter whose default value is 20K. The buffer itself is three times
337 this size, but because of the way it is used for holding "before"
338 lines, the longest line that is guaranteed to be processable is the
339 parameter size. If a longer line is encountered, pcre2grep automati‐
340 cally expands the buffer, up to a specified maximum size, whose default
341 is 1M or the starting size, whichever is the larger. You can change the
342 default parameter values by adding, for example,
343
344 --with-pcre2grep-bufsize=51200
345 --with-pcre2grep-max-bufsize=2097152
346
347 to the configure command. The caller of pcre2grep can override these
348 values by using --buffer-size and --max-buffer-size on the command
349 line.
350
352
353 If you add one of
354
355 --enable-pcre2test-libreadline
356 --enable-pcre2test-libedit
357
358 to the configure command, pcre2test is linked with the libreadline
359 orlibedit library, respectively, and when its input is from a terminal,
360 it reads it using the readline() function. This provides line-editing
361 and history facilities. Note that libreadline is GPL-licensed, so if
362 you distribute a binary of pcre2test linked in this way, there may be
363 licensing issues. These can be avoided by linking instead with libedit,
364 which has a BSD licence.
365
366 Setting --enable-pcre2test-libreadline causes the -lreadline option to
367 be added to the pcre2test build. In many operating environments with a
368 sytem-installed readline library this is sufficient. However, in some
369 environments (e.g. if an unmodified distribution version of readline is
370 in use), some extra configuration may be necessary. The INSTALL file
371 for libreadline says this:
372
373 "Readline uses the termcap functions, but does not link with
374 the termcap or curses library itself, allowing applications
375 which link with readline the to choose an appropriate library."
376
377 If your environment has not been set up so that an appropriate library
378 is automatically included, you may need to add something like
379
380 LIBS="-ncurses"
381
382 immediately before the configure command.
383
385
386 If you add
387
388 --enable-debug
389
390 to the configure command, additional debugging code is included in the
391 build. This feature is intended for use by the PCRE2 maintainers.
392
394
395 If you add
396
397 --enable-valgrind
398
399 to the configure command, PCRE2 will use valgrind annotations to mark
400 certain memory regions as unaddressable. This allows it to detect
401 invalid memory accesses, and is mostly useful for debugging PCRE2
402 itself.
403
405
406 If your C compiler is gcc, you can build a version of PCRE2 that can
407 generate a code coverage report for its test suite. To enable this, you
408 must install lcov version 1.6 or above. Then specify
409
410 --enable-coverage
411
412 to the configure command and build PCRE2 in the usual way.
413
414 Note that using ccache (a caching C compiler) is incompatible with code
415 coverage reporting. If you have configured ccache to run automatically
416 on your system, you must set the environment variable
417
418 CCACHE_DISABLE=1
419
420 before running make to build PCRE2, so that ccache is not used.
421
422 When --enable-coverage is used, the following addition targets are
423 added to the Makefile:
424
425 make coverage
426
427 This creates a fresh coverage report for the PCRE2 test suite. It is
428 equivalent to running "make coverage-reset", "make coverage-baseline",
429 "make check", and then "make coverage-report".
430
431 make coverage-reset
432
433 This zeroes the coverage counters, but does nothing else.
434
435 make coverage-baseline
436
437 This captures baseline coverage information.
438
439 make coverage-report
440
441 This creates the coverage report.
442
443 make coverage-clean-report
444
445 This removes the generated coverage report without cleaning the cover‐
446 age data itself.
447
448 make coverage-clean-data
449
450 This removes the captured coverage data without removing the coverage
451 files created at compile time (*.gcno).
452
453 make coverage-clean
454
455 This cleans all coverage data including the generated coverage report.
456 For more information about code coverage, see the gcov and lcov docu‐
457 mentation.
458
460
461 There is a special option for use by people who want to run fuzzing
462 tests on PCRE2:
463
464 --enable-fuzz-support
465
466 At present this applies only to the 8-bit library. If set, it causes an
467 extra library called libpcre2-fuzzsupport.a to be built, but not
468 installed. This contains a single function called LLVMFuzzerTestOneIn‐
469 put() whose arguments are a pointer to a string and the length of the
470 string. When called, this function tries to compile the string as a
471 pattern, and if that succeeds, to match it. This is done both with no
472 options and with some random options bits that are generated from the
473 string. Setting --enable-fuzz-support also causes a binary called
474 pcre2fuzzcheck to be created. This is normally run under valgrind or
475 used when PCRE2 is compiled with address sanitizing enabled. It calls
476 the fuzzing function and outputs information about it is doing. The
477 input strings are specified by arguments: if an argument starts with
478 "=" the rest of it is a literal input string. Otherwise, it is assumed
479 to be a file name, and the contents of the file are the test string.
480
482
483 pcre2api(3), pcre2-config(3).
484
486
487 Philip Hazel
488 University Computing Service
489 Cambridge, England.
490
492
493 Last updated: 01 November 2016
494 Copyright (c) 1997-2016 University of Cambridge.
495
496
497
498PCRE2 10.23 01 November 2016 PCRE2BUILD(3)