1PCRE2SERIALIZE(3)          Library Functions Manual          PCRE2SERIALIZE(3)
2
3
4

NAME

6       PCRE2 - Perl-compatible regular expressions (revised API)
7

SAVING AND RE-USING PRECOMPILED PCRE2 PATTERNS

9
10       int32_t pcre2_serialize_decode(pcre2_code **codes,
11         int32_t number_of_codes, const uint32_t *bytes,
12         pcre2_general_context *gcontext);
13
14       int32_t pcre2_serialize_encode(pcre2_code **codes,
15         int32_t number_of_codes, uint32_t **serialized_bytes,
16         PCRE2_SIZE *serialized_size, pcre2_general_context *gcontext);
17
18       void pcre2_serialize_free(uint8_t *bytes);
19
20       int32_t pcre2_serialize_get_number_of_codes(const uint8_t *bytes);
21
22       If  you  are running an application that uses a large number of regular
23       expression patterns, it may be useful to store them  in  a  precompiled
24       form  instead  of  having to compile them every time the application is
25       run. However, if you are using the just-in-time  optimization  feature,
26       it is not possible to save and reload the JIT data, because it is posi‐
27       tion-dependent. The host on which the patterns  are  reloaded  must  be
28       running  the  same version of PCRE2, with the same code unit width, and
29       must also have the same endianness, pointer width and PCRE2_SIZE  type.
30       For  example, patterns compiled on a 32-bit system using PCRE2's 16-bit
31       library cannot be reloaded on a 64-bit system, nor can they be reloaded
32       using the 8-bit library.
33

SECURITY CONCERNS

35
36       The facility for saving and restoring compiled patterns is intended for
37       use within individual applications.  As  such,  the  data  supplied  to
38       pcre2_serialize_decode()  is expected to be trusted data, not data from
39       arbitrary external sources.  There  is  only  some  simple  consistency
40       checking, not complete validation of what is being re-loaded.
41

SAVING COMPILED PATTERNS

43
44       Before compiled patterns can be saved they must be serialized, that is,
45       converted to a stream of bytes. A single byte stream  may  contain  any
46       number  of  compiled patterns, but they must all use the same character
47       tables. A single copy of the tables is included in the byte stream (its
48       size is 1088 bytes). For more details of character tables, see the sec‐
49       tion on locale support in the pcre2api documentation.
50
51       The function pcre2_serialize_encode() creates a serialized byte  stream
52       from  a  list of compiled patterns. Its first two arguments specify the
53       list, being a pointer to a vector of pointers to compiled patterns, and
54       the length of the vector. The third and fourth arguments point to vari‐
55       ables which are set to point to the created byte stream and its length,
56       respectively.  The  final  argument  is a pointer to a general context,
57       which can be used to specify custom memory  mangagement  functions.  If
58       this  argument  is NULL, malloc() is used to obtain memory for the byte
59       stream. The yield of the function is the number of serialized patterns,
60       or one of the following negative error codes:
61
62         PCRE2_ERROR_BADDATA      the number of patterns is zero or less
63         PCRE2_ERROR_BADMAGIC     mismatch of id bytes in one of the patterns
64         PCRE2_ERROR_MEMORY       memory allocation failed
65         PCRE2_ERROR_MIXEDTABLES  the patterns do not all use the same tables
66         PCRE2_ERROR_NULL         the 1st, 3rd, or 4th argument is NULL
67
68       PCRE2_ERROR_BADMAGIC  means  either that a pattern's code has been cor‐
69       rupted, or that a slot in the vector does not point to a compiled  pat‐
70       tern.
71
72       Once a set of patterns has been serialized you can save the data in any
73       appropriate manner. Here is sample code that compiles two patterns  and
74       writes them to a file. It assumes that the variable fd refers to a file
75       that is open for output. The error checking that should be present in a
76       real application has been omitted for simplicity.
77
78         int errorcode;
79         uint8_t *bytes;
80         PCRE2_SIZE erroroffset;
81         PCRE2_SIZE bytescount;
82         pcre2_code *list_of_codes[2];
83         list_of_codes[0] = pcre2_compile("first pattern",
84           PCRE2_ZERO_TERMINATED, 0, &errorcode, &erroroffset, NULL);
85         list_of_codes[1] = pcre2_compile("second pattern",
86           PCRE2_ZERO_TERMINATED, 0, &errorcode, &erroroffset, NULL);
87         errorcode = pcre2_serialize_encode(list_of_codes, 2, &bytes,
88           &bytescount, NULL);
89         errorcode = fwrite(bytes, 1, bytescount, fd);
90
91       Note  that  the  serialized data is binary data that may contain any of
92       the 256 possible byte  values.  On  systems  that  make  a  distinction
93       between binary and non-binary data, be sure that the file is opened for
94       binary output.
95
96       Serializing a set of patterns leaves the original  data  untouched,  so
97       they  can  still  be used for matching. Their memory must eventually be
98       freed in the usual way by calling pcre2_code_free(). When you have fin‐
99       ished with the byte stream, it too must be freed by calling pcre2_seri‐
100       alize_free().
101

RE-USING PRECOMPILED PATTERNS

103
104       In order to re-use a set of saved patterns  you  must  first  make  the
105       serialized  byte stream available in main memory (for example, by read‐
106       ing from a file). The management of this memory  block  is  up  to  the
107       application.  You  can  use  the  pcre2_serialize_get_number_of_codes()
108       function to find out how many compiled patterns are in  the  serialized
109       data without actually decoding the patterns:
110
111         uint8_t *bytes = <serialized data>;
112         int32_t number_of_codes = pcre2_serialize_get_number_of_codes(bytes);
113
114       The pcre2_serialize_decode() function reads a byte stream and recreates
115       the compiled patterns in new memory blocks, setting pointers to them in
116       a  vector.  The  first two arguments are a pointer to a suitable vector
117       and its length, and the third argument points to  a  byte  stream.  The
118       final  argument is a pointer to a general context, which can be used to
119       specify custom memory mangagement functions for the  decoded  patterns.
120       If this argument is NULL, malloc() and free() are used. After deserial‐
121       ization, the byte stream is no longer needed and can be discarded.
122
123         int32_t number_of_codes;
124         pcre2_code *list_of_codes[2];
125         uint8_t *bytes = <serialized data>;
126         int32_t number_of_codes =
127           pcre2_serialize_decode(list_of_codes, 2, bytes, NULL);
128
129       If the vector is not large enough for all  the  patterns  in  the  byte
130       stream,  it  is  filled  with  those  that  fit,  and the remainder are
131       ignored. The yield of the function is the number of  decoded  patterns,
132       or one of the following negative error codes:
133
134         PCRE2_ERROR_BADDATA    second argument is zero or less
135         PCRE2_ERROR_BADMAGIC   mismatch of id bytes in the data
136         PCRE2_ERROR_BADMODE    mismatch of code unit size or PCRE2 version
137         PCRE2_ERROR_BADSERIALIZEDDATA  other sanity check failure
138         PCRE2_ERROR_MEMORY     memory allocation failed
139         PCRE2_ERROR_NULL       first or third argument is NULL
140
141       PCRE2_ERROR_BADMAGIC  may mean that the data is corrupt, or that it was
142       compiled on a system with different endianness.
143
144       Decoded patterns can be used for matching in the usual way, and must be
145       freed  by  calling pcre2_code_free(). However, be aware that there is a
146       potential race issue if you  are  using  multiple  patterns  that  were
147       decoded  from  a  single  byte stream in a multithreaded application. A
148       single copy of the character tables is used by all the decoded patterns
149       and a reference count is used to arrange for its memory to be automati‐
150       cally freed when the last pattern is freed, but there is no locking  on
151       this  reference count. Therefore, if you want to call pcre2_code_free()
152       for these patterns in different threads,  you  must  arrange  your  own
153       locking,  and  ensure  that  pcre2_code_free()  cannot be called by two
154       threads at the same time.
155
156       If a pattern was processed by pcre2_jit_compile() before being  serial‐
157       ized,  the  JIT data is discarded and so is no longer available after a
158       save/restore cycle. You can, however, process a restored  pattern  with
159       pcre2_jit_compile() if you wish.
160

AUTHOR

162
163       Philip Hazel
164       University Computing Service
165       Cambridge, England.
166

REVISION

168
169       Last updated: 24 May 2016
170       Copyright (c) 1997-2016 University of Cambridge.
171
172
173
174PCRE2 10.22                       24 May 2016                PCRE2SERIALIZE(3)
Impressum