1PCRE2SERIALIZE(3)          Library Functions Manual          PCRE2SERIALIZE(3)
2
3
4

NAME

6       PCRE2 - Perl-compatible regular expressions (revised API)
7

SAVING AND RE-USING PRECOMPILED PCRE2 PATTERNS

9
10       int32_t pcre2_serialize_decode(pcre2_code **codes,
11         int32_t number_of_codes, const uint32_t *bytes,
12         pcre2_general_context *gcontext);
13
14       int32_t pcre2_serialize_encode(pcre2_code **codes,
15         int32_t number_of_codes, uint32_t **serialized_bytes,
16         PCRE2_SIZE *serialized_size, pcre2_general_context *gcontext);
17
18       void pcre2_serialize_free(uint8_t *bytes);
19
20       int32_t pcre2_serialize_get_number_of_codes(const uint8_t *bytes);
21
22       If  you  are running an application that uses a large number of regular
23       expression patterns, it may be useful to store them  in  a  precompiled
24       form  instead  of  having to compile them every time the application is
25       run. However, if you are using the just-in-time  optimization  feature,
26       it is not possible to save and reload the JIT data, because it is posi‐
27       tion-dependent. The host on which the patterns  are  reloaded  must  be
28       running  the  same version of PCRE2, with the same code unit width, and
29       must also have the same endianness, pointer width and PCRE2_SIZE  type.
30       For  example, patterns compiled on a 32-bit system using PCRE2's 16-bit
31       library cannot be reloaded on a 64-bit system, nor can they be reloaded
32       using the 8-bit library.
33
34       Note  that  "serialization" in PCRE2 does not convert compiled patterns
35       to an abstract format like Java or .NET serialization.  The  serialized
36       output  is  really  just  a  bytecode dump, which is why it can only be
37       reloaded in the same environment as the one that created it. Hence  the
38       restrictions  mentioned  above.   Applications  that are not statically
39       linked with a fixed version of PCRE2 must be prepared to recompile pat‐
40       terns from their sources, in order to be immune to PCRE2 upgrades.
41

SECURITY CONCERNS

43
44       The facility for saving and restoring compiled patterns is intended for
45       use within individual applications.  As  such,  the  data  supplied  to
46       pcre2_serialize_decode()  is expected to be trusted data, not data from
47       arbitrary external sources.  There  is  only  some  simple  consistency
48       checking, not complete validation of what is being re-loaded. Corrupted
49       data may cause undefined results. For example, if the length field of a
50       pattern in the serialized data is corrupted, the deserializing code may
51       read beyond the end of the byte stream that is passed to it.
52

SAVING COMPILED PATTERNS

54
55       Before compiled patterns can be saved they must be serialized, which in
56       PCRE2  means converting the pattern to a stream of bytes. A single byte
57       stream may contain any number of compiled patterns, but they  must  all
58       use  the same character tables. A single copy of the tables is included
59       in the byte stream (its size is 1088 bytes). For more details of  char‐
60       acter  tables,  see the section on locale support in the pcre2api docu‐
61       mentation.
62
63       The function pcre2_serialize_encode() creates a serialized byte  stream
64       from  a  list of compiled patterns. Its first two arguments specify the
65       list, being a pointer to a vector of pointers to compiled patterns, and
66       the length of the vector. The third and fourth arguments point to vari‐
67       ables which are set to point to the created byte stream and its length,
68       respectively.  The  final  argument  is a pointer to a general context,
69       which can be used to specify custom memory  mangagement  functions.  If
70       this  argument  is NULL, malloc() is used to obtain memory for the byte
71       stream. The yield of the function is the number of serialized patterns,
72       or one of the following negative error codes:
73
74         PCRE2_ERROR_BADDATA      the number of patterns is zero or less
75         PCRE2_ERROR_BADMAGIC     mismatch of id bytes in one of the patterns
76         PCRE2_ERROR_MEMORY       memory allocation failed
77         PCRE2_ERROR_MIXEDTABLES  the patterns do not all use the same tables
78         PCRE2_ERROR_NULL         the 1st, 3rd, or 4th argument is NULL
79
80       PCRE2_ERROR_BADMAGIC  means  either that a pattern's code has been cor‐
81       rupted, or that a slot in the vector does not point to a compiled  pat‐
82       tern.
83
84       Once a set of patterns has been serialized you can save the data in any
85       appropriate manner. Here is sample code that compiles two patterns  and
86       writes them to a file. It assumes that the variable fd refers to a file
87       that is open for output. The error checking that should be present in a
88       real application has been omitted for simplicity.
89
90         int errorcode;
91         uint8_t *bytes;
92         PCRE2_SIZE erroroffset;
93         PCRE2_SIZE bytescount;
94         pcre2_code *list_of_codes[2];
95         list_of_codes[0] = pcre2_compile("first pattern",
96           PCRE2_ZERO_TERMINATED, 0, &errorcode, &erroroffset, NULL);
97         list_of_codes[1] = pcre2_compile("second pattern",
98           PCRE2_ZERO_TERMINATED, 0, &errorcode, &erroroffset, NULL);
99         errorcode = pcre2_serialize_encode(list_of_codes, 2, &bytes,
100           &bytescount, NULL);
101         errorcode = fwrite(bytes, 1, bytescount, fd);
102
103       Note  that  the  serialized data is binary data that may contain any of
104       the 256 possible byte  values.  On  systems  that  make  a  distinction
105       between binary and non-binary data, be sure that the file is opened for
106       binary output.
107
108       Serializing a set of patterns leaves the original  data  untouched,  so
109       they  can  still  be used for matching. Their memory must eventually be
110       freed in the usual way by calling pcre2_code_free(). When you have fin‐
111       ished with the byte stream, it too must be freed by calling pcre2_seri‐
112       alize_free(). If this function is  called  with  a  NULL  argument,  it
113       returns immediately without doing anything.
114

RE-USING PRECOMPILED PATTERNS

116
117       In  order  to  re-use  a  set of saved patterns you must first make the
118       serialized byte stream available in main memory (for example, by  read‐
119       ing  from  a  file).  The  management of this memory block is up to the
120       application.  You  can  use  the  pcre2_serialize_get_number_of_codes()
121       function  to  find out how many compiled patterns are in the serialized
122       data without actually decoding the patterns:
123
124         uint8_t *bytes = <serialized data>;
125         int32_t number_of_codes = pcre2_serialize_get_number_of_codes(bytes);
126
127       The pcre2_serialize_decode() function reads a byte stream and recreates
128       the compiled patterns in new memory blocks, setting pointers to them in
129       a vector. The first two arguments are a pointer to  a  suitable  vector
130       and  its  length,  and  the third argument points to a byte stream. The
131       final argument is a pointer to a general context, which can be used  to
132       specify  custom  memory mangagement functions for the decoded patterns.
133       If this argument is NULL, malloc() and free() are used. After deserial‐
134       ization, the byte stream is no longer needed and can be discarded.
135
136         int32_t number_of_codes;
137         pcre2_code *list_of_codes[2];
138         uint8_t *bytes = <serialized data>;
139         int32_t number_of_codes =
140           pcre2_serialize_decode(list_of_codes, 2, bytes, NULL);
141
142       If  the  vector  is  not  large enough for all the patterns in the byte
143       stream, it is filled  with  those  that  fit,  and  the  remainder  are
144       ignored.  The  yield of the function is the number of decoded patterns,
145       or one of the following negative error codes:
146
147         PCRE2_ERROR_BADDATA    second argument is zero or less
148         PCRE2_ERROR_BADMAGIC   mismatch of id bytes in the data
149         PCRE2_ERROR_BADMODE    mismatch of code unit size or PCRE2 version
150         PCRE2_ERROR_BADSERIALIZEDDATA  other sanity check failure
151         PCRE2_ERROR_MEMORY     memory allocation failed
152         PCRE2_ERROR_NULL       first or third argument is NULL
153
154       PCRE2_ERROR_BADMAGIC may mean that the data is corrupt, or that it  was
155       compiled on a system with different endianness.
156
157       Decoded patterns can be used for matching in the usual way, and must be
158       freed by calling pcre2_code_free(). However, be aware that there  is  a
159       potential  race  issue  if  you  are  using multiple patterns that were
160       decoded from a single byte stream in  a  multithreaded  application.  A
161       single copy of the character tables is used by all the decoded patterns
162       and a reference count is used to arrange for its memory to be automati‐
163       cally  freed when the last pattern is freed, but there is no locking on
164       this reference count. Therefore, if you want to call  pcre2_code_free()
165       for  these  patterns  in  different  threads, you must arrange your own
166       locking, and ensure that pcre2_code_free()  cannot  be  called  by  two
167       threads at the same time.
168
169       If  a pattern was processed by pcre2_jit_compile() before being serial‐
170       ized, the JIT data is discarded and so is no longer available  after  a
171       save/restore  cycle.  You can, however, process a restored pattern with
172       pcre2_jit_compile() if you wish.
173

AUTHOR

175
176       Philip Hazel
177       University Computing Service
178       Cambridge, England.
179

REVISION

181
182       Last updated: 27 June 2018
183       Copyright (c) 1997-2018 University of Cambridge.
184
185
186
187PCRE2 10.32                      27 June 2018                PCRE2SERIALIZE(3)
Impressum