1Sereal::Decoder(3) User Contributed Perl Documentation Sereal::Decoder(3)
2
3
4
6 Sereal::Decoder - Fast, compact, powerful binary deserialization
7
9 use Sereal::Decoder
10 qw(decode_sereal sereal_decode_with_object scalar_looks_like_sereal);
11
12 my $decoder = Sereal::Decoder->new({...options...});
13
14 my $structure;
15 $decoder->decode($blob, $structure); # deserializes into $structure
16
17 # or if you don't have references to the top level structure, this works, too:
18 $structure = $decoder->decode($blob);
19
20 # alternatively functional interface: (See Sereal::Performance)
21 sereal_decode_with_object($decoder, $blob, $structure);
22 $structure = sereal_decode_with_object($decoder, $blob);
23
24 # much slower functional interface with no persistent objects:
25 decode_sereal($blob, {... options ...}, $structure);
26 $structure = decode_sereal($blob, {... options ...});
27
28 # Not a full validation, but just a quick check for a reasonable header:
29 my $is_likely_sereal = scalar_looks_like_sereal($some_string);
30 # or:
31 $is_likely_sereal = $decoder->looks_like_sereal($some_string);
32
34 This library implements a deserializer for an efficient, compact-
35 output, and feature-rich binary protocol called Sereal. Its sister
36 module Sereal::Encoder implements an encoder for this format. The two
37 are released separately to allow for independent and safer upgrading.
38
39 The Sereal protocol versions that are compatible with this decoder
40 implementation are currently protocol versions 1, 2, 3 and 4. As it
41 stands, it will refuse to attempt to decode future versions of the
42 protocol, but if necessary there is likely going to be an option to
43 decode the parts of the input that are compatible with version 4 of the
44 protocol. The protocol was designed to allow for this.
45
46 The protocol specification and many other bits of documentation can be
47 found in the github repository. Right now, the specification is at
48 <https://github.com/Sereal/Sereal/blob/master/sereal_spec.pod>, there
49 is a discussion of the design objectives in
50 <https://github.com/Sereal/Sereal/blob/master/README.pod>, and the
51 output of our benchmarks can be seen at
52 <https://github.com/Sereal/Sereal/wiki/Sereal-Comparison-Graphs>.
53
55 new
56 Constructor. Optionally takes a hash reference as first parameter. This
57 hash reference may contain any number of options that influence the
58 behaviour of the encoder.
59
60 Currently, the following options are recognized, none of them are on by
61 default.
62
63 refuse_snappy
64
65 If set, the decoder will refuse Snappy-compressed input data. This can
66 be desirable for robustness. See the section "ROBUSTNESS" below.
67
68 refuse_objects
69
70 If set, the decoder will refuse deserializing any objects in the input
71 stream and instead throw an exception. Defaults to off. See the section
72 "ROBUSTNESS" below.
73
74 no_bless_objects
75
76 If set, the decoder will deserialize any objects in the input stream
77 but without blessing them. Defaults to off. See the section
78 "ROBUSTNESS" below.
79
80 validate_utf8
81
82 If set, the decoder will refuse invalid UTF-8 byte sequences. This is
83 off by default, but it's strongly encouraged to be turned on if you're
84 dealing with any data that has been encoded by an external source (e.g.
85 http cookies).
86
87 max_recursion_depth
88
89 "Sereal::Decoder" is recursive. If you pass it a Sereal document that
90 is deeply nested, it will eventually exhaust the C stack. Therefore,
91 there is a limit on the depth of recursion that is accepted. It
92 defaults to 10000 nested calls. You may choose to override this value
93 with the "max_recursion_depth" option. Beware that setting it too high
94 can cause hard crashes.
95
96 Do note that the setting is somewhat approximate. Setting it to 10000
97 may break at somewhere between 9997 and 10003 nested structures
98 depending on their types.
99
100 max_num_hash_entries
101
102 If set to a non-zero value (default: 0), then "Sereal::Decoder" will
103 refuse to deserialize any hash/dictionary (or hash-based object) with
104 more than that number of entries. This is to be able to respond quickly
105 to any future hash-collision attacks on Perl's hash function, and also
106 the memory exhaustion attacks on Sereal itself. For a gentle
107 introduction to the topic from the cryptographic point of view, see
108 <http://en.wikipedia.org/wiki/Collision_attack>.
109
110 max_num_array_entries
111
112 If set to a non-zero value (default: 0), then "Sereal::Decoder" will
113 refuse to deserialize any array with more than that number of entries.
114 This is to be able to respond quickly to any future memory exhaustion
115 attacks on Sereal.
116
117 max_string_length
118
119 If set to a non-zero value (default: 0), then "Sereal::Decoder" will
120 refuse to deserialize any string with more than that number of
121 characters. This is to be able to respond quickly to any future memory
122 exhaustion attacks on Sereal.
123
124 max_uncompressed_size
125
126 If set to a non-zero value (default: 0), then "Sereal::Decoder" will
127 refuse to deserialize any blob with a size that exceds the value when
128 uncompressed. This is to be able to respond quickly to any future
129 memory exhaustion attacks on Sereal.
130
131 incremental
132
133 If set to a non-zero value (default: 0), then "Sereal::Decoder" will
134 destructively parse Sereal documents out of a variable. Every time a
135 Sereal document is successfully parsed it is removed from the front of
136 the string it is parsed from.
137
138 This means you can do this:
139
140 while (length $buffer) {
141 my $data= decode_sereal($buffer,{incremental=>1});
142 }
143
144 alias_smallint
145
146 If set to a true value then "Sereal::Decoder" will share integers from
147 -16 to 15 (encoded as either SRL_HDR_NEG and SRL_HDR_POS) as read-only
148 aliases to a common SV.
149
150 The result of this may be significant space savings in data structures
151 with many integers in the specified range. The cost is more memory used
152 by the decoder and a very modest speed penalty when deserializing.
153
154 Note this option changes the structure of the dumped data. Use with
155 caution.
156
157 See also the "alias_varint_under" option.
158
159 alias_varint_under
160
161 If set to a true positive integer smaller than 16 then this option is
162 similar to setting "alias_smallint" and causes all integers from -16 to
163 15 to be shared as read-only aliases to the same SV, except that this
164 treatment ALSO applies to SRL_HDR_VARINT. If set to a value larger than
165 16 then this applies to all varints varints under the value set. (In
166 general SRL_HDR_VARINT is used only for integers larger than 15, and
167 SRL_HDR_NEG and SRL_HDR_POS are used for -16 to -1 and 0 to 15
168 respectively.)
169
170 In simple terms if you want to share values larger than 16 then you
171 should use this option, if you want to share only values in the -16 to
172 15 range then you should use the "alias_smallint" option instead.
173
174 The result of this may be significant space savings in data structures
175 with many integers in the desire range. The cost is more memory used by
176 the decoder and a very modest speed penalty when deserializing.
177
178 Note this option changes the structure of the dumped data. Use with
179 caution.
180
181 use_undef
182
183 If set to a true value then this any undef value to be deserialized as
184 PL_sv_undef. This may change the structure of the data structure being
185 dumped, do not enable this unless you know what you are doing.
186
187 set_readonly
188
189 If set to a true value then the output will be completely readonly
190 (deeply).
191
192 set_readonly_scalars
193
194 If set to a true value then scalars in the output will be readonly
195 (deeply). References won't be readonly.
196
198 decode
199 Given a byte string of Sereal data, the "decode" call deserializes that
200 data structure. The result can be obtained in one of two ways: "decode"
201 accepts a second parameter, which is a scalar to write the result to,
202 AND "decode" will return the resulting data structure.
203
204 The two are subtly different in case of data structures that contain
205 references to the root element. In that case, the return value will be
206 a (non-recursive) copy of the reference. The pass-in style is more
207 correct. In other words,
208
209 $decoder->decode($sereal_string, my $out);
210 # is almost the same but safer than:
211 my $out = $decoder->decode($sereal_string);
212
213 This is an unfortunate side-effect of perls standard copy semantics of
214 assignment. Possibly one day we will have an alternative to this.
215
216 decode_with_header
217 Given a byte string of Sereal data, the "decode_with_header" call
218 deserializes that data structure as "decode" would do, however it also
219 decodes the optional user data structure that can be embedded into a
220 Sereal document, inside the header (see Sereal::Encoder::encode).
221
222 It accepts an optional second parameter, which is a scalar to write the
223 body to, and an optional third parameter, which is a scalar to write
224 the header to.
225
226 Regardless of the number of parameters received, "decode_with_header"
227 returns an ArrayRef containing the deserialized header, and the
228 deserialized body, in this order.
229
230 See "decode" for the subtle difference between the one, two and three
231 parameters versions.
232
233 If there is no header in a Sereal document, corresponding variable or
234 return value will be set to undef.
235
236 decode_only_header
237 Given a byte string of Sereal data, the "decode_only_header"
238 deserializes only the optional user data structure that can be embedded
239 into a Sereal document, inside the header (see
240 Sereal::Encoder::encode).
241
242 It accepts an optional second parameter, which is a scalar to write the
243 header to.
244
245 Regardless of the number of parameters received, "decode_only_header"
246 returns the resulting data structure.
247
248 See "decode" for the subtle difference between the one and two
249 parameters versions.
250
251 If there is no header in a Sereal document, corresponding variable or
252 return value will be set to undef.
253
254 decode_with_offset
255 Same as the "decode" method, except as second parameter, you must pass
256 an integer offset into the input string, at which the decoding is to
257 start. The optional "pass-in" style scalar (see "decode" above) is
258 relegated to being the third parameter.
259
260 decode_only_header_with_offset
261 Same as the "decode_only_header" method, except as second parameter,
262 you must pass an integer offset into the input string, at which the
263 decoding is to start. The optional "pass-in" style scalar (see
264 "decode_only_header" above) is relegated to being the third parameter.
265
266 decode_with_header_and_offset
267 Same as the "decode_with_header" method, except as second parameter,
268 you must pass an integer offset into the input string, at which the
269 decoding is to start. The optional "pass-in" style scalars (see
270 "decode_with_header" above) are relegated to being the third and fourth
271 parameters.
272
273 bytes_consumed
274 After using the various "decode" methods documented previously,
275 "bytes_consumed" can return the number of bytes from the body of the
276 input string that were actually consumed by the decoder. That is, if
277 you append random garbage to a valid Sereal document, "decode" will
278 happily decode the data and ignore the garbage. If that is an error in
279 your use case, you can use "bytes_consumed" to catch it.
280
281 my $out = $decoder->decode($sereal_string);
282 if (length($sereal_string) != $decoder->bytes_consumed) {
283 die "Not all input data was consumed!";
284 }
285
286 Chances are that if you do this, you're violating UNIX philosophy in
287 "be strict in what you emit but lenient in what you accept".
288
289 You can also use this to deserialize a list of Sereal documents that is
290 concatenated into the same string (code not very robust...):
291
292 my @out;
293 my $pos = 0;
294 eval {
295 while (1) {
296 push @out, $decoder->decode_with_offset($sereal_string, $pos);
297 $pos += $decoder->bytes_consumed;
298 last if $pos >= length($sereal_string)
299 or not $decoder->bytes_consumed;
300 }
301 };
302
303 As mentioned, only the bytes consumed from the body are considered. So
304 the following example is correct, as only the header is deserialized:
305
306 my $header = $decoder->decode_only_header($sereal_string);
307 my $count = $decoder->bytes_consumed;
308 # $count is 0
309
310 decode_from_file
311 Sereal::Decoder->decode_from_file($file);
312 $decoder->decode_from_file($file);
313
314 Read and decode the file specified. If called in list context and
315 incremental mode is enabled then decodes all packets contained in the
316 file and returns a list, otherwise decodes the first (or only) packet
317 in the file. Accepts an optinal "target" variable as a second argument.
318
319 looks_like_sereal
320 Performs some rudimentary check to determine if the argument appears to
321 be a valid Sereal packet or not. These tests are not comprehensive and
322 a true result does not mean that the document is valid, merely that it
323 appears to be valid. On the other hand a false result is always
324 reliable.
325
326 The return of this method may be treated as a simple boolean but is in
327 fact a more complex return. When the argument does not look anything
328 like a Sereal document then the return is perl's FALSE, which has the
329 property of being string equivalent to "" and numerically equivalent to
330 0. However when the argument appears to be a UTF-8 encoded protocol 3
331 Sereal document (by noticing that the \xF3 in the magic string has been
332 replaced by \xC3\xB3) then it returns 0 (the number, which is string
333 equivalent to "0"), and otherwise returns the protocol version of the
334 document. This means you can write something like this:
335
336 $type= Sereal::Decoder->looks_like_sereal($thing);
337 if ($type eq '') {
338 say "Not a Sereal document";
339 } elsif ($type eq '0') {
340 say "Possibly utf8 encoded Sereal document";
341 } else {
342 say "Sereal document version $type";
343 }
344
345 For reference, Sereal's magic value is a four byte string which is
346 either "=srl" for protocol version 1 and 2 or "=\xF3rl" for protocol
347 version 3 and later. This function checks that the magic string
348 corresponds with the reported version number, as well as other checks,
349 which may be enhanced in the future.
350
351 Note that looks_like_sereal() may be called as a class or object
352 method, and may also be called as a single argument function. See the
353 related scalar_looks_like_sereal() for a version which may ONLY be
354 called as a function, not as a method (and which is typically much
355 faster).
356
358 sereal_decode_with_object
359 The functional interface that is equivalent to using "decode". Takes a
360 decoder object reference as first parameter, followed by a byte string
361 to deserialize. Optionally takes a third parameter, which is the
362 output scalar to write to. See the documentation for "decode" above for
363 details.
364
365 This functional interface is marginally faster than the OO interface
366 since it avoids method resolution overhead and, on sufficiently modern
367 Perl versions, can usually avoid subroutine call overhead. See
368 Sereal::Performance for a discussion on how to tune Sereal for maximum
369 performance if you need to.
370
371 sereal_decode_with_header_with_object
372 The functional interface that is equivalent to using
373 "decode_with_header". Takes a decoder object reference as first
374 parameter, followed by a byte string to deserialize. Optionally takes
375 third and fourth parameters, which are the output scalars to write to.
376 See the documentation for "decode_with_header" above for details.
377
378 This functional interface is marginally faster than the OO interface
379 since it avoids method resolution overhead and, on sufficiently modern
380 Perl versions, can usually avoid subroutine call overhead. See
381 Sereal::Performance for a discussion on how to tune Sereal for maximum
382 performance if you need to.
383
384 sereal_decode_only_header_with_object
385 The functional interface that is equivalent to using
386 "decode_only_header". Takes a decoder object reference as first
387 parameter, followed by a byte string to deserialize. Optionally takes a
388 third parameters, which outputs scalars to write to. See the
389 documentation for "decode_with_header" above for details.
390
391 This functional interface is marginally faster than the OO interface
392 since it avoids method resolution overhead and, on sufficiently modern
393 Perl versions, can usually avoid subroutine call overhead. See
394 Sereal::Performance for a discussion on how to tune Sereal for maximum
395 performance if you need to.
396
397 sereal_decode_only_header_with_offset_with_object
398 The functional interface that is equivalent to using
399 "decode_only_header_with_offset". Same as the
400 "sereal_decode_only_header_with_object" function, except as the third
401 parameter, you must pass an integer offset into the input string, at
402 which the decoding is to start. The optional "pass-in" style scalar
403 (see "sereal_decode_only_header_with_object" above) is relegated to
404 being the fourth parameter.
405
406 This functional interface is marginally faster than the OO interface
407 since it avoids method resolution overhead and, on sufficiently modern
408 Perl versions, can usually avoid subroutine call overhead. See
409 Sereal::Performance for a discussion on how to tune Sereal for maximum
410 performance if you need to.
411
412 sereal_decode_with_header_and_offset_with_object
413 The functional interface that is equivalent to using
414 "decode_with_header_and_offset". Same as the
415 "sereal_decode_with_header_with_object" function, except as the third
416 parameter, you must pass an integer offset into the input string, at
417 which the decoding is to start. The optional "pass-in" style scalars
418 (see "sereal_decode_with_header_with_object" above) are relegated to
419 being the fourth and fifth parameters.
420
421 This functional interface is marginally faster than the OO interface
422 since it avoids method resolution overhead and, on sufficiently modern
423 Perl versions, can usually avoid subroutine call overhead. See
424 Sereal::Performance for a discussion on how to tune Sereal for maximum
425 performance if you need to.
426
427 sereal_decode_with_offset_with_object
428 The functional interface that is equivalent to using
429 "decode_with_offset". Same as the "sereal_decode_with_object"
430 function, except as the third parameter, you must pass an integer
431 offset into the input string, at which the decoding is to start. The
432 optional "pass-in" style scalar (see "sereal_decode_with_object" above)
433 is relegated to being the third parameter.
434
435 This functional interface is marginally faster than the OO interface
436 since it avoids method resolution overhead and, on sufficiently modern
437 Perl versions, can usually avoid subroutine call overhead. See
438 Sereal::Performance for a discussion on how to tune Sereal for maximum
439 performance if you need to.
440
441 decode_sereal
442 The functional interface that is equivalent to using "new" and
443 "decode". Expects a byte string to deserialize as first argument,
444 optionally followed by a hash reference of options (see documentation
445 for "new()"). Finally, "decode_sereal" supports a third parameter,
446 which is the output scalar to write to. See the documentation for
447 "decode" above for details.
448
449 This functional interface is significantly slower than the OO interface
450 since it cannot reuse the decoder object.
451
452 decode_sereal_with_header_data
453 The functional interface that is equivalent to using "new" and
454 "decode_with_header". Expects a byte string to deserialize as first
455 argument, optionally followed by a hash reference of options (see
456 documentation for "new()"). Finally, "decode_sereal" supports third and
457 fourth parameters, which are the output scalars to write to. See the
458 documentation for "decode_with_header" above for details.
459
460 This functional interface is significantly slower than the OO interface
461 since it cannot reuse the decoder object.
462
463 scalar_looks_like_sereal
464 The functional interface that is equivalent to using
465 "looks_like_sereal".
466
467 Note that this version cannot be called as a method. It is normally
468 executed as a custom opcode, as such errors about its usage may be
469 caught at compile time, and it should be much faster than
470 looks_like_sereal.
471
473 This implementation of a Sereal decoder tries to be as robust to
474 invalid input data as reasonably possible. This means that it should
475 never (though read on) segfault. It may, however, cause a large malloc
476 to fail. Generally speaking, invalid data should cause a Perl-trappable
477 exception. The one exception is that for Snappy-compressed Sereal
478 documents, the Snappy library may cause segmentation faults (invalid
479 reads or writes). This should only be a problem if you do not checksum
480 your data (internal checksum support is a To-Do) or if you accept data
481 from potentially malicious sources.
482
483 It requires a lot of run-time boundary checks to prevent decoder
484 segmentation faults on invalid data. We implemented them in the
485 lightest way possible. Adding robustness against running out of memory
486 would cause an very significant run-time overhead. In most cases of
487 random garbage (with valid header no less) when a malloc() fails due to
488 invalid data, the problem was caused by a very large array or string
489 length. This kind of very large malloc can then fail, being trappable
490 from Perl. Only when packet causes many repeated allocations do you
491 risk causing a hard OOM error from the kernel that cannot be trapped
492 because Perl may require some small allocations to succeed before the
493 now-invalid memory is released. It is at least not entirely trivial to
494 craft a Sereal document that causes this behaviour.
495
496 Finally, deserializing proper objects is potentially a problem because
497 classes can define a destructor. Thus, the data fed to the decoder can
498 cause the (deferred) execution of any destructor in your application.
499 That's why the "refuse_objects" option exists and what the
500 "no_bless_objects" can be used for as well. Later on, we may or may not
501 provide a facility to whitelist classes. Furthermore, if the encoder
502 emitted any objects using "FREEZE" callbacks, the "THAW" class method
503 may be invoked on the respective classes. If you can't trust the source
504 of your Sereal documents, you may want to use the "refuse_objects"
505 option. For more details on the "FREEZE/THAW" mechanism, please refer
506 to Sereal::Encoder.
507
509 Please refer to the Sereal::Performance document that has more detailed
510 information about Sereal performance and tuning thereof.
511
513 "Sereal::Decoder" is thread-safe on Perl's 5.8.7 and higher. This means
514 "thread-safe" in the sense that if you create a new thread, all
515 "Sereal::Decoder" objects will become a reference to undef in the new
516 thread. This might change in a future release to become a full clone of
517 the decoder object.
518
520 For reporting bugs, please use the github bug tracker at
521 <http://github.com/Sereal/Sereal/issues>.
522
523 For support and discussion of Sereal, there are two Google Groups:
524
525 Announcements around Sereal (extremely low volume):
526 <https://groups.google.com/forum/?fromgroups#!forum/sereal-announce>
527
528 Sereal development list:
529 <https://groups.google.com/forum/?fromgroups#!forum/sereal-dev>
530
532 Yves Orton <demerphq@gmail.com>
533
534 Damian Gryski
535
536 Steffen Mueller <smueller@cpan.org>
537
538 Rafaël Garcia-Suarez
539
540 Ævar Arnfjörð Bjarmason <avar@cpan.org>
541
542 Tim Bunce
543
544 Daniel Dragan <bulkdd@cpan.org> (Windows support and bugfixes)
545
546 Zefram
547
548 Borislav Nikolov
549
550 Ivan Kruglov <ivan.kruglov@yahoo.com>
551
552 Eric Herman <eric@freesa.org>
553
554 Some inspiration and code was taken from Marc Lehmann's excellent
555 JSON::XS module due to obvious overlap in problem domain.
556
558 This module was originally developed for Booking.com. With approval
559 from Booking.com, this module was generalized and published on CPAN,
560 for which the authors would like to express their gratitude.
561
563 Copyright (C) 2012, 2013, 2014 by Steffen Mueller Copyright (C) 2012,
564 2013, 2014 by Yves Orton
565
566 The license for the code in this distribution is the following, with
567 the exceptions listed below:
568
569 This library is free software; you can redistribute it and/or modify it
570 under the same terms as Perl itself.
571
572 Except portions taken from Marc Lehmann's code for the JSON::XS module,
573 which is licensed under the same terms as this module. (Many thanks to
574 Marc for inspiration, and code.)
575
576 Also except the code for Snappy compression library, whose license is
577 reproduced below and which, to the best of our knowledge, is compatible
578 with this module's license. The license for the enclosed Snappy code
579 is:
580
581 Copyright 2011, Google Inc.
582 All rights reserved.
583
584 Redistribution and use in source and binary forms, with or without
585 modification, are permitted provided that the following conditions are
586 met:
587
588 * Redistributions of source code must retain the above copyright
589 notice, this list of conditions and the following disclaimer.
590 * Redistributions in binary form must reproduce the above
591 copyright notice, this list of conditions and the following disclaimer
592 in the documentation and/or other materials provided with the
593 distribution.
594 * Neither the name of Google Inc. nor the names of its
595 contributors may be used to endorse or promote products derived from
596 this software without specific prior written permission.
597
598 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
599 "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
600 LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
601 A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
602 OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
603 SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
604 LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
605 DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
606 THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
607 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
608 OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
609
610
611
612perl v5.32.0 2020-08-04 Sereal::Decoder(3)