1PCRE2BUILD(3)              Library Functions Manual              PCRE2BUILD(3)
2
3
4

NAME

6       PCRE2 - Perl-compatible regular expressions (revised API)
7

BUILDING PCRE2

9
10       PCRE2  is distributed with a configure script that can be used to build
11       the library in Unix-like environments using the applications  known  as
12       Autotools. Also in the distribution are files to support building using
13       CMake instead of configure.  The  text  file  README  contains  general
14       information  about  building  with Autotools (some of which is repeated
15       below), and also has some comments about building on various  operating
16       systems.  There  is a lot more information about building PCRE2 without
17       using Autotools (including information about using CMake  and  building
18       "by  hand")  in  the  text file called NON-AUTOTOOLS-BUILD.  You should
19       consult this file as well as the README file if you are building  in  a
20       non-Unix-like environment.
21

PCRE2 BUILD-TIME OPTIONS

23
24       The rest of this document describes the optional features of PCRE2 that
25       can be selected when the library is compiled. It  assumes  use  of  the
26       configure  script,  where  the  optional features are selected or dese‐
27       lected by providing options to configure before running the  make  com‐
28       mand.  However,  the same options can be selected in both Unix-like and
29       non-Unix-like environments if you are using CMake instead of  configure
30       to build PCRE2.
31
32       If  you  are not using Autotools or CMake, option selection can be done
33       by editing the config.h file, or by passing parameter settings  to  the
34       compiler, as described in NON-AUTOTOOLS-BUILD.
35
36       The complete list of options for configure (which includes the standard
37       ones such as the  selection  of  the  installation  directory)  can  be
38       obtained by running
39
40         ./configure --help
41
42       The  following  sections  include  descriptions  of options whose names
43       begin with --enable or --disable. These settings specify changes to the
44       defaults  for  the configure command. Because of the way that configure
45       works, --enable and --disable always come in pairs, so  the  complemen‐
46       tary  option always exists as well, but as it specifies the default, it
47       is not described.
48

BUILDING 8-BIT, 16-BIT AND 32-BIT LIBRARIES

50
51       By default, a library called libpcre2-8 is built, containing  functions
52       that  take  string arguments contained in vectors of bytes, interpreted
53       either as single-byte characters, or UTF-8 strings. You can also  build
54       two  other libraries, called libpcre2-16 and libpcre2-32, which process
55       strings that are contained in vectors of 16-bit and 32-bit code  units,
56       respectively. These can be interpreted either as single-unit characters
57       or UTF-16/UTF-32 strings. To build these additional libraries, add  one
58       or both of the following to the configure command:
59
60         --enable-pcre2-16
61         --enable-pcre2-32
62
63       If you do not want the 8-bit library, add
64
65         --disable-pcre2-8
66
67       as  well.  At least one of the three libraries must be built. Note that
68       the POSIX wrapper is for the 8-bit library only, and that pcre2grep  is
69       an  8-bit  program.  Neither  of these are built if you select only the
70       16-bit or 32-bit libraries.
71

BUILDING SHARED AND STATIC LIBRARIES

73
74       The Autotools PCRE2 building process uses libtool to build both  shared
75       and  static  libraries by default. You can suppress an unwanted library
76       by adding one of
77
78         --disable-shared
79         --disable-static
80
81       to the configure command.
82

UNICODE AND UTF SUPPORT

84
85       By default, PCRE2 is built with support for Unicode and  UTF  character
86       strings.  To build it without Unicode support, add
87
88         --disable-unicode
89
90       to  the configure command. This setting applies to all three libraries.
91       It is not possible to build  one  library  with  Unicode  support,  and
92       another without, in the same configuration.
93
94       Of  itself, Unicode support does not make PCRE2 treat strings as UTF-8,
95       UTF-16 or UTF-32. To do that, applications that use the library can set
96       the  PCRE2_UTF  option when they call pcre2_compile() to compile a pat‐
97       tern.  Alternatively, patterns may be started with  (*UTF)  unless  the
98       application has locked this out by setting PCRE2_NEVER_UTF.
99
100       UTF support allows the libraries to process character code points up to
101       0x10ffff in the strings that they handle. It also provides support  for
102       accessing  the  Unicode  properties  of  such characters, using pattern
103       escapes such as \P, \p, and \X. Only the  general  category  properties
104       such  as Lu and Nd are supported. Details are given in the pcre2pattern
105       documentation.
106
107       Pattern escapes such as \d and \w do not by default make use of Unicode
108       properties.  The  application  can  request that they do by setting the
109       PCRE2_UCP option. Unless the application  has  set  PCRE2_NEVER_UCP,  a
110       pattern may also request this by starting with (*UCP).
111

DISABLING THE USE OF \C

113
114       The \C escape sequence, which matches a single code unit, even in a UTF
115       mode, can cause unpredictable behaviour because it may leave  the  cur‐
116       rent  matching  point in the middle of a multi-code-unit character. The
117       application can lock it  out  by  setting  the  PCRE2_NEVER_BACKSLASH_C
118       option when calling pcre2_compile(). There is also a build-time option
119
120         --enable-never-backslash-C
121
122       (note the upper case C) which locks out the use of \C entirely.
123

JUST-IN-TIME COMPILER SUPPORT

125
126       Just-in-time compiler support is included in the build by specifying
127
128         --enable-jit
129
130       This  support  is available only for certain hardware architectures. If
131       this option is set for an unsupported architecture,  a  building  error
132       occurs.   See the pcre2jit documentation for a discussion of JIT usage.
133       When JIT support is enabled, pcre2grep automatically makes use  of  it,
134       unless you add
135
136         --disable-pcre2grep-jit
137
138       to the "configure" command.
139

NEWLINE RECOGNITION

141
142       By  default, PCRE2 interprets the linefeed (LF) character as indicating
143       the end of a line. This is the normal newline  character  on  Unix-like
144       systems.  You can compile PCRE2 to use carriage return (CR) instead, by
145       adding
146
147         --enable-newline-is-cr
148
149       to the configure  command.  There  is  also  an  --enable-newline-is-lf
150       option, which explicitly specifies linefeed as the newline character.
151
152       Alternatively, you can specify that line endings are to be indicated by
153       the two-character sequence CRLF (CR immediately followed by LF). If you
154       want this, add
155
156         --enable-newline-is-crlf
157
158       to the configure command. There is a fourth option, specified by
159
160         --enable-newline-is-anycrlf
161
162       which  causes  PCRE2 to recognize any of the three sequences CR, LF, or
163       CRLF as indicating a line ending. Finally, a fifth option, specified by
164
165         --enable-newline-is-any
166
167       causes PCRE2 to recognize any Unicode  newline  sequence.  The  Unicode
168       newline sequences are the three just mentioned, plus the single charac‐
169       ters VT (vertical tab, U+000B), FF (form feed, U+000C), NEL (next line,
170       U+0085),  LS  (line  separator,  U+2028),  and PS (paragraph separator,
171       U+2029).
172
173       Whatever default line ending convention is selected when PCRE2 is built
174       can  be  overridden by applications that use the library. At build time
175       it is conventional to use the standard for your operating system.
176

WHAT \R MATCHES

178
179       By default, the sequence \R in a pattern matches  any  Unicode  newline
180       sequence,  independently  of  what has been selected as the line ending
181       sequence. If you specify
182
183         --enable-bsr-anycrlf
184
185       the default is changed so that \R matches only CR, LF, or  CRLF.  What‐
186       ever  is selected when PCRE2 is built can be overridden by applications
187       that use the called.
188

HANDLING VERY LARGE PATTERNS

190
191       Within a compiled pattern, offset values are used  to  point  from  one
192       part  to another (for example, from an opening parenthesis to an alter‐
193       nation metacharacter). By default, in the 8-bit and  16-bit  libraries,
194       two-byte  values  are used for these offsets, leading to a maximum size
195       for a compiled pattern of around 64K code units. This is sufficient  to
196       handle all but the most gigantic patterns. Nevertheless, some people do
197       want to process truly enormous patterns, so it is possible  to  compile
198       PCRE2  to  use three-byte or four-byte offsets by adding a setting such
199       as
200
201         --with-link-size=3
202
203       to the configure command. The value given must be 2, 3, or 4.  For  the
204       16-bit  library,  a  value of 3 is rounded up to 4. In these libraries,
205       using longer offsets slows down the operation of PCRE2 because  it  has
206       to  load additional data when handling them. For the 32-bit library the
207       value is always 4 and cannot be overridden; the value  of  --with-link-
208       size is ignored.
209

AVOIDING EXCESSIVE STACK USAGE

211
212       When  matching  with the pcre2_match() function, PCRE2 implements back‐
213       tracking by making recursive  calls  to  an  internal  function  called
214       match().  In  environments where the size of the stack is limited, this
215       can severely limit PCRE2's operation. (The Unix  environment  does  not
216       usually  suffer from this problem, but it may sometimes be necessary to
217       increase  the  maximum  stack  size.  There  is  a  discussion  in  the
218       pcre2stack  documentation.)  An  alternative approach to recursion that
219       uses memory from the heap to remember data, instead of using  recursive
220       function  calls, has been implemented to work round the problem of lim‐
221       ited stack size. If you want to build a version  of  PCRE2  that  works
222       this way, add
223
224         --disable-stack-for-recursion
225
226       to the configure command. By default, the system functions malloc() and
227       free() are called to manage the heap memory that is required, but  cus‐
228       tom  memory  management  functions  can  be  called instead. PCRE2 runs
229       noticeably more slowly when built in this way. This option affects only
230       the pcre2_match() function; it is not relevant for pcre2_dfa_match().
231

LIMITING PCRE2 RESOURCE USAGE

233
234       Internally, PCRE2 has a function called match(), which it calls repeat‐
235       edly  (sometimes  recursively)  when  matching  a  pattern   with   the
236       pcre2_match() function. By controlling the maximum number of times this
237       function may be called during a single matching operation, a limit  can
238       be  placed on the resources used by a single call to pcre2_match(). The
239       limit can be changed at run time, as described in the pcre2api documen‐
240       tation.  The default is 10 million, but this can be changed by adding a
241       setting such as
242
243         --with-match-limit=500000
244
245       to  the  configure  command.  This  setting  has  no  effect   on   the
246       pcre2_dfa_match() matching function.
247
248       In  some  environments  it is desirable to limit the depth of recursive
249       calls of match() more strictly than the total number of calls, in order
250       to  restrict  the maximum amount of stack (or heap, if --disable-stack-
251       for-recursion is specified) that is used. A second limit controls this;
252       it  defaults  to  the  value  that is set for --with-match-limit, which
253       imposes no additional constraints. However, you can set a  lower  limit
254       by adding, for example,
255
256         --with-match-limit-recursion=10000
257
258       to  the  configure  command.  This  value can also be overridden at run
259       time.
260

CREATING CHARACTER TABLES AT BUILD TIME

262
263       PCRE2 uses fixed tables for processing characters whose code points are
264       less than 256. By default, PCRE2 is built with a set of tables that are
265       distributed in the file src/pcre2_chartables.c.dist. These  tables  are
266       for ASCII codes only. If you add
267
268         --enable-rebuild-chartables
269
270       to  the  configure  command, the distributed tables are no longer used.
271       Instead, a program called dftables is compiled and  run.  This  outputs
272       the source for new set of tables, created in the default locale of your
273       C run-time system. (This method of replacing the tables does  not  work
274       if  you are cross compiling, because dftables is run on the local host.
275       If you need to create alternative tables when cross compiling, you will
276       have to do so "by hand".)
277

USING EBCDIC CODE

279
280       PCRE2  assumes  by default that it will run in an environment where the
281       character code is ASCII or Unicode, which is a superset of ASCII.  This
282       is the case for most computer operating systems. PCRE2 can, however, be
283       compiled to run in an 8-bit EBCDIC environment by adding
284
285         --enable-ebcdic --disable-unicode
286
287       to the configure command. This setting implies --enable-rebuild-charta‐
288       bles.  You  should  only  use  it if you know that you are in an EBCDIC
289       environment (for example, an IBM mainframe operating system).
290
291       It is not possible to support both EBCDIC and UTF-8 codes in  the  same
292       version  of  the  library. Consequently, --enable-unicode and --enable-
293       ebcdic are mutually exclusive.
294
295       The EBCDIC character that corresponds to an ASCII LF is assumed to have
296       the  value  0x15 by default. However, in some EBCDIC environments, 0x25
297       is used. In such an environment you should use
298
299         --enable-ebcdic-nl25
300
301       as well as, or instead of, --enable-ebcdic. The EBCDIC character for CR
302       has  the  same  value  as in ASCII, namely, 0x0d. Whichever of 0x15 and
303       0x25 is not chosen as LF is made to correspond to the Unicode NEL char‐
304       acter (which, in Unicode, is 0x85).
305
306       The options that select newline behaviour, such as --enable-newline-is-
307       cr, and equivalent run-time options, refer to these character values in
308       an EBCDIC environment.
309

PCRE2GREP SUPPORT FOR EXTERNAL SCRIPTS

311
312       By default, on non-Windows systems, pcre2grep supports the use of call‐
313       outs with string arguments within the patterns it is matching, in order
314       to  run external scripts. For details, see the pcre2grep documentation.
315       This support can be disabled by adding  --disable-pcre2grep-callout  to
316       the configure command.
317

PCRE2GREP OPTIONS FOR COMPRESSED FILE SUPPORT

319
320       By  default,  pcre2grep reads all files as plain text. You can build it
321       so that it recognizes files whose names end in .gz or .bz2,  and  reads
322       them with libz or libbz2, respectively, by adding one or both of
323
324         --enable-pcre2grep-libz
325         --enable-pcre2grep-libbz2
326
327       to the configure command. These options naturally require that the rel‐
328       evant libraries are installed on your system. Configuration  will  fail
329       if they are not.
330

PCRE2GREP BUFFER SIZE

332
333       pcre2grep  uses an internal buffer to hold a "window" on the file it is
334       scanning, in order to be able to output "before" and "after" lines when
335       it  finds  a  match. The starting size of the buffer is controlled by a
336       parameter whose default value is 20K. The buffer itself is three  times
337       this  size,  but  because  of  the  way it is used for holding "before"
338       lines, the longest line that is guaranteed to  be  processable  is  the
339       parameter  size.  If  a longer line is encountered, pcre2grep automati‐
340       cally expands the buffer, up to a specified maximum size, whose default
341       is 1M or the starting size, whichever is the larger. You can change the
342       default parameter values by adding, for example,
343
344         --with-pcre2grep-bufsize=51200
345         --with-pcre2grep-max-bufsize=2097152
346
347       to the configure command. The caller of pcre2grep  can  override  these
348       values  by  using  --buffer-size  and  --max-buffer-size on the command
349       line.
350

PCRE2TEST OPTION FOR LIBREADLINE SUPPORT

352
353       If you add one of
354
355         --enable-pcre2test-libreadline
356         --enable-pcre2test-libedit
357
358       to the configure command, pcre2test  is  linked  with  the  libreadline
359       orlibedit library, respectively, and when its input is from a terminal,
360       it reads it using the readline() function. This  provides  line-editing
361       and  history  facilities.  Note that libreadline is GPL-licensed, so if
362       you distribute a binary of pcre2test linked in this way, there  may  be
363       licensing issues. These can be avoided by linking instead with libedit,
364       which has a BSD licence.
365
366       Setting --enable-pcre2test-libreadline causes the -lreadline option  to
367       be  added to the pcre2test build. In many operating environments with a
368       sytem-installed readline library this is sufficient. However,  in  some
369       environments (e.g. if an unmodified distribution version of readline is
370       in use), some extra configuration may be necessary.  The  INSTALL  file
371       for libreadline says this:
372
373         "Readline uses the termcap functions, but does not link with
374         the termcap or curses library itself, allowing applications
375         which link with readline the to choose an appropriate library."
376
377       If  your environment has not been set up so that an appropriate library
378       is automatically included, you may need to add something like
379
380         LIBS="-ncurses"
381
382       immediately before the configure command.
383

INCLUDING DEBUGGING CODE

385
386       If you add
387
388         --enable-debug
389
390       to the configure command, additional debugging code is included in  the
391       build. This feature is intended for use by the PCRE2 maintainers.
392

DEBUGGING WITH VALGRIND SUPPORT

394
395       If you add
396
397         --enable-valgrind
398
399       to  the  configure command, PCRE2 will use valgrind annotations to mark
400       certain memory regions as  unaddressable.  This  allows  it  to  detect
401       invalid  memory  accesses,  and  is  mostly  useful for debugging PCRE2
402       itself.
403

CODE COVERAGE REPORTING

405
406       If your C compiler is gcc, you can build a version of  PCRE2  that  can
407       generate a code coverage report for its test suite. To enable this, you
408       must install lcov version 1.6 or above. Then specify
409
410         --enable-coverage
411
412       to the configure command and build PCRE2 in the usual way.
413
414       Note that using ccache (a caching C compiler) is incompatible with code
415       coverage  reporting. If you have configured ccache to run automatically
416       on your system, you must set the environment variable
417
418         CCACHE_DISABLE=1
419
420       before running make to build PCRE2, so that ccache is not used.
421
422       When --enable-coverage is used,  the  following  addition  targets  are
423       added to the Makefile:
424
425         make coverage
426
427       This  creates  a  fresh coverage report for the PCRE2 test suite. It is
428       equivalent to running "make coverage-reset", "make  coverage-baseline",
429       "make check", and then "make coverage-report".
430
431         make coverage-reset
432
433       This zeroes the coverage counters, but does nothing else.
434
435         make coverage-baseline
436
437       This captures baseline coverage information.
438
439         make coverage-report
440
441       This creates the coverage report.
442
443         make coverage-clean-report
444
445       This  removes the generated coverage report without cleaning the cover‐
446       age data itself.
447
448         make coverage-clean-data
449
450       This removes the captured coverage data without removing  the  coverage
451       files created at compile time (*.gcno).
452
453         make coverage-clean
454
455       This  cleans all coverage data including the generated coverage report.
456       For more information about code coverage, see the gcov and  lcov  docu‐
457       mentation.
458

SUPPORT FOR FUZZERS

460
461       There  is  a  special  option for use by people who want to run fuzzing
462       tests on PCRE2:
463
464         --enable-fuzz-support
465
466       At present this applies only to the 8-bit library. If set, it causes an
467       extra  library  called  libpcre2-fuzzsupport.a  to  be  built,  but not
468       installed. This contains a single function called  LLVMFuzzerTestOneIn‐
469       put()  whose  arguments are a pointer to a string and the length of the
470       string. When called, this function tries to compile  the  string  as  a
471       pattern,  and if that succeeds, to match it.  This is done both with no
472       options and with some random options bits that are generated  from  the
473       string.  Setting  --enable-fuzz-support  also  causes  a  binary called
474       pcre2fuzzcheck to be created. This is normally run  under  valgrind  or
475       used  when  PCRE2 is compiled with address sanitizing enabled. It calls
476       the fuzzing function and outputs information about  it  is  doing.  The
477       input  strings  are  specified by arguments: if an argument starts with
478       "=" the rest of it is a literal input string. Otherwise, it is  assumed
479       to be a file name, and the contents of the file are the test string.
480

SEE ALSO

482
483       pcre2api(3), pcre2-config(3).
484

AUTHOR

486
487       Philip Hazel
488       University Computing Service
489       Cambridge, England.
490

REVISION

492
493       Last updated: 01 November 2016
494       Copyright (c) 1997-2016 University of Cambridge.
495
496
497
498PCRE2 10.23                    01 November 2016                  PCRE2BUILD(3)
Impressum