1PCRE2SERIALIZE(3) Library Functions Manual PCRE2SERIALIZE(3)
2
3
4
6 PCRE2 - Perl-compatible regular expressions (revised API)
7
9
10 int32_t pcre2_serialize_decode(pcre2_code **codes,
11 int32_t number_of_codes, const uint32_t *bytes,
12 pcre2_general_context *gcontext);
13
14 int32_t pcre2_serialize_encode(pcre2_code **codes,
15 int32_t number_of_codes, uint32_t **serialized_bytes,
16 PCRE2_SIZE *serialized_size, pcre2_general_context *gcontext);
17
18 void pcre2_serialize_free(uint8_t *bytes);
19
20 int32_t pcre2_serialize_get_number_of_codes(const uint8_t *bytes);
21
22 If you are running an application that uses a large number of regular
23 expression patterns, it may be useful to store them in a precompiled
24 form instead of having to compile them every time the application is
25 run. However, if you are using the just-in-time optimization feature,
26 it is not possible to save and reload the JIT data, because it is posi‐
27 tion-dependent. The host on which the patterns are reloaded must be
28 running the same version of PCRE2, with the same code unit width, and
29 must also have the same endianness, pointer width and PCRE2_SIZE type.
30 For example, patterns compiled on a 32-bit system using PCRE2's 16-bit
31 library cannot be reloaded on a 64-bit system, nor can they be reloaded
32 using the 8-bit library.
33
34 Note that "serialization" in PCRE2 does not convert compiled patterns
35 to an abstract format like Java or .NET serialization. The serialized
36 output is really just a bytecode dump, which is why it can only be
37 reloaded in the same environment as the one that created it. Hence the
38 restrictions mentioned above. Applications that are not statically
39 linked with a fixed version of PCRE2 must be prepared to recompile pat‐
40 terns from their sources, in order to be immune to PCRE2 upgrades.
41
43
44 The facility for saving and restoring compiled patterns is intended for
45 use within individual applications. As such, the data supplied to
46 pcre2_serialize_decode() is expected to be trusted data, not data from
47 arbitrary external sources. There is only some simple consistency
48 checking, not complete validation of what is being re-loaded. Corrupted
49 data may cause undefined results. For example, if the length field of a
50 pattern in the serialized data is corrupted, the deserializing code may
51 read beyond the end of the byte stream that is passed to it.
52
54
55 Before compiled patterns can be saved they must be serialized, which in
56 PCRE2 means converting the pattern to a stream of bytes. A single byte
57 stream may contain any number of compiled patterns, but they must all
58 use the same character tables. A single copy of the tables is included
59 in the byte stream (its size is 1088 bytes). For more details of char‐
60 acter tables, see the section on locale support in the pcre2api docu‐
61 mentation.
62
63 The function pcre2_serialize_encode() creates a serialized byte stream
64 from a list of compiled patterns. Its first two arguments specify the
65 list, being a pointer to a vector of pointers to compiled patterns, and
66 the length of the vector. The third and fourth arguments point to vari‐
67 ables which are set to point to the created byte stream and its length,
68 respectively. The final argument is a pointer to a general context,
69 which can be used to specify custom memory mangagement functions. If
70 this argument is NULL, malloc() is used to obtain memory for the byte
71 stream. The yield of the function is the number of serialized patterns,
72 or one of the following negative error codes:
73
74 PCRE2_ERROR_BADDATA the number of patterns is zero or less
75 PCRE2_ERROR_BADMAGIC mismatch of id bytes in one of the patterns
76 PCRE2_ERROR_MEMORY memory allocation failed
77 PCRE2_ERROR_MIXEDTABLES the patterns do not all use the same tables
78 PCRE2_ERROR_NULL the 1st, 3rd, or 4th argument is NULL
79
80 PCRE2_ERROR_BADMAGIC means either that a pattern's code has been cor‐
81 rupted, or that a slot in the vector does not point to a compiled pat‐
82 tern.
83
84 Once a set of patterns has been serialized you can save the data in any
85 appropriate manner. Here is sample code that compiles two patterns and
86 writes them to a file. It assumes that the variable fd refers to a file
87 that is open for output. The error checking that should be present in a
88 real application has been omitted for simplicity.
89
90 int errorcode;
91 uint8_t *bytes;
92 PCRE2_SIZE erroroffset;
93 PCRE2_SIZE bytescount;
94 pcre2_code *list_of_codes[2];
95 list_of_codes[0] = pcre2_compile("first pattern",
96 PCRE2_ZERO_TERMINATED, 0, &errorcode, &erroroffset, NULL);
97 list_of_codes[1] = pcre2_compile("second pattern",
98 PCRE2_ZERO_TERMINATED, 0, &errorcode, &erroroffset, NULL);
99 errorcode = pcre2_serialize_encode(list_of_codes, 2, &bytes,
100 &bytescount, NULL);
101 errorcode = fwrite(bytes, 1, bytescount, fd);
102
103 Note that the serialized data is binary data that may contain any of
104 the 256 possible byte values. On systems that make a distinction
105 between binary and non-binary data, be sure that the file is opened for
106 binary output.
107
108 Serializing a set of patterns leaves the original data untouched, so
109 they can still be used for matching. Their memory must eventually be
110 freed in the usual way by calling pcre2_code_free(). When you have fin‐
111 ished with the byte stream, it too must be freed by calling pcre2_seri‐
112 alize_free(). If this function is called with a NULL argument, it
113 returns immediately without doing anything.
114
116
117 In order to re-use a set of saved patterns you must first make the
118 serialized byte stream available in main memory (for example, by read‐
119 ing from a file). The management of this memory block is up to the
120 application. You can use the pcre2_serialize_get_number_of_codes()
121 function to find out how many compiled patterns are in the serialized
122 data without actually decoding the patterns:
123
124 uint8_t *bytes = <serialized data>;
125 int32_t number_of_codes = pcre2_serialize_get_number_of_codes(bytes);
126
127 The pcre2_serialize_decode() function reads a byte stream and recreates
128 the compiled patterns in new memory blocks, setting pointers to them in
129 a vector. The first two arguments are a pointer to a suitable vector
130 and its length, and the third argument points to a byte stream. The
131 final argument is a pointer to a general context, which can be used to
132 specify custom memory mangagement functions for the decoded patterns.
133 If this argument is NULL, malloc() and free() are used. After deserial‐
134 ization, the byte stream is no longer needed and can be discarded.
135
136 int32_t number_of_codes;
137 pcre2_code *list_of_codes[2];
138 uint8_t *bytes = <serialized data>;
139 int32_t number_of_codes =
140 pcre2_serialize_decode(list_of_codes, 2, bytes, NULL);
141
142 If the vector is not large enough for all the patterns in the byte
143 stream, it is filled with those that fit, and the remainder are
144 ignored. The yield of the function is the number of decoded patterns,
145 or one of the following negative error codes:
146
147 PCRE2_ERROR_BADDATA second argument is zero or less
148 PCRE2_ERROR_BADMAGIC mismatch of id bytes in the data
149 PCRE2_ERROR_BADMODE mismatch of code unit size or PCRE2 version
150 PCRE2_ERROR_BADSERIALIZEDDATA other sanity check failure
151 PCRE2_ERROR_MEMORY memory allocation failed
152 PCRE2_ERROR_NULL first or third argument is NULL
153
154 PCRE2_ERROR_BADMAGIC may mean that the data is corrupt, or that it was
155 compiled on a system with different endianness.
156
157 Decoded patterns can be used for matching in the usual way, and must be
158 freed by calling pcre2_code_free(). However, be aware that there is a
159 potential race issue if you are using multiple patterns that were
160 decoded from a single byte stream in a multithreaded application. A
161 single copy of the character tables is used by all the decoded patterns
162 and a reference count is used to arrange for its memory to be automati‐
163 cally freed when the last pattern is freed, but there is no locking on
164 this reference count. Therefore, if you want to call pcre2_code_free()
165 for these patterns in different threads, you must arrange your own
166 locking, and ensure that pcre2_code_free() cannot be called by two
167 threads at the same time.
168
169 If a pattern was processed by pcre2_jit_compile() before being serial‐
170 ized, the JIT data is discarded and so is no longer available after a
171 save/restore cycle. You can, however, process a restored pattern with
172 pcre2_jit_compile() if you wish.
173
175
176 Philip Hazel
177 University Computing Service
178 Cambridge, England.
179
181
182 Last updated: 27 June 2018
183 Copyright (c) 1997-2018 University of Cambridge.
184
185
186
187PCRE2 10.32 27 June 2018 PCRE2SERIALIZE(3)