1Sereal::Decoder(3) User Contributed Perl Documentation Sereal::Decoder(3)
2
3
4
6 Sereal::Decoder - Fast, compact, powerful binary deserialization
7
9 use Sereal::Decoder
10 qw(decode_sereal sereal_decode_with_object scalar_looks_like_sereal);
11
12 my $decoder = Sereal::Decoder->new({...options...});
13
14 my $structure;
15 $decoder->decode($blob, $structure); # deserializes into $structure
16
17 # or if you don't have references to the top level structure, this works, too:
18 $structure = $decoder->decode($blob);
19
20 # alternatively functional interface: (See Sereal::Performance)
21 sereal_decode_with_object($decoder, $blob, $structure);
22 $structure = sereal_decode_with_object($decoder, $blob);
23
24 # much slower functional interface with no persistent objects:
25 decode_sereal($blob, {... options ...}, $structure);
26 $structure = decode_sereal($blob, {... options ...});
27
28 # Not a full validation, but just a quick check for a reasonable header:
29 my $is_likely_sereal = scalar_looks_like_sereal($some_string);
30 # or:
31 $is_likely_sereal = $decoder->looks_like_sereal($some_string);
32
34 This library implements a deserializer for an efficient, compact-
35 output, and feature-rich binary protocol called Sereal. Its sister
36 module Sereal::Encoder implements an encoder for this format. The two
37 are released separately to allow for independent and safer upgrading.
38
39 The Sereal protocol versions that are compatible with this decoder
40 implementation are currently protocol versions 1, 2, 3 and 4. As it
41 stands, it will refuse to attempt to decode future versions of the
42 protocol, but if necessary there is likely going to be an option to
43 decode the parts of the input that are compatible with version 4 of the
44 protocol. The protocol was designed to allow for this.
45
46 The protocol specification and many other bits of documentation can be
47 found in the github repository. Right now, the specification is at
48 <https://github.com/Sereal/Sereal/blob/master/sereal_spec.pod>, there
49 is a discussion of the design objectives in
50 <https://github.com/Sereal/Sereal/blob/master/README.pod>, and the
51 output of our benchmarks can be seen at
52 <https://github.com/Sereal/Sereal/wiki/Sereal-Comparison-Graphs>.
53
55 new
56 Constructor. Optionally takes a hash reference as first parameter. This
57 hash reference may contain any number of options that influence the
58 behaviour of the encoder.
59
60 Currently, the following options are recognized, none of them are on by
61 default.
62
63 refuse_snappy
64
65 If set, the decoder will refuse Snappy-compressed input data. This can
66 be desirable for robustness. See the section "ROBUSTNESS" below.
67
68 refuse_objects
69
70 If set, the decoder will refuse deserializing any objects in the input
71 stream and instead throw an exception. Defaults to off. See the section
72 "ROBUSTNESS" below.
73
74 no_bless_objects
75
76 If set, the decoder will deserialize any objects in the input stream
77 but without blessing them. Defaults to off. See the section
78 "ROBUSTNESS" below.
79
80 validate_utf8
81
82 If set, the decoder will refuse invalid UTF-8 byte sequences. This is
83 off by default, but it's strongly encouraged to be turned on if you're
84 dealing with any data that has been encoded by an external source (e.g.
85 http cookies).
86
87 max_recursion_depth
88
89 "Sereal::Decoder" is recursive. If you pass it a Sereal document that
90 is deeply nested, it will eventually exhaust the C stack. Therefore,
91 there is a limit on the depth of recursion that is accepted. It
92 defaults to 10000 nested calls. You may choose to override this value
93 with the "max_recursion_depth" option. Beware that setting it too high
94 can cause hard crashes.
95
96 Do note that the setting is somewhat approximate. Setting it to 10000
97 may break at somewhere between 9997 and 10003 nested structures
98 depending on their types.
99
100 max_num_hash_entries
101
102 If set to a non-zero value (default: 0), then "Sereal::Decoder" will
103 refuse to deserialize any hash/dictionary (or hash-based object) with
104 more than that number of entries. This is to be able to respond quickly
105 to any future hash-collision attacks on Perl's hash function. Chances
106 are, you don't want or need this. For a gentle introduction to the
107 topic from the cryptographic point of view, see
108 <http://en.wikipedia.org/wiki/Collision_attack>.
109
110 incremental
111
112 If set to a non-zero value (default: 0), then "Sereal::Decoder" will
113 destructively parse Sereal documents out of a variable. Every time a
114 Sereal document is successfully parsed it is removed from the front of
115 the string it is parsed from.
116
117 This means you can do this:
118
119 while (length $buffer) {
120 my $data= decode_sereal($buffer,{incremental=>1});
121 }
122
123 alias_smallint
124
125 If set to a true value then "Sereal::Decoder" will share integers from
126 -16 to 15 (encoded as either SRL_HDR_NEG and SRL_HDR_POS) as read-only
127 aliases to a common SV.
128
129 The result of this may be significant space savings in data structures
130 with many integers in the specified range. The cost is more memory used
131 by the decoder and a very modest speed penalty when deserializing.
132
133 Note this option changes the structure of the dumped data. Use with
134 caution.
135
136 See also the "alias_varint_under" option.
137
138 alias_varint_under
139
140 If set to a true positive integer smaller than 16 then this option is
141 similar to setting "alias_smallint" and causes all integers from -16 to
142 15 to be shared as read-only aliases to the same SV, except that this
143 treatment ALSO applies to SRL_HDR_VARINT. If set to a value larger than
144 16 then this applies to all varints varints under the value set. (In
145 general SRL_HDR_VARINT is used only for integers larger than 15, and
146 SRL_HDR_NEG and SRL_HDR_POS are used for -16 to -1 and 0 to 15
147 respectively.)
148
149 In simple terms if you want to share values larger than 16 then you
150 should use this option, if you want to share only values in the -16 to
151 15 range then you should use the "alias_smallint" option instead.
152
153 The result of this may be significant space savings in data structures
154 with many integers in the desire range. The cost is more memory used by
155 the decoder and a very modest speed penalty when deserializing.
156
157 Note this option changes the structure of the dumped data. Use with
158 caution.
159
160 use_undef
161
162 If set to a true value then this any undef value to be deserialized as
163 PL_sv_undef. This may change the structure of the data structure being
164 dumped, do not enable this unless you know what you are doing.
165
166 set_readonly
167
168 If set to a true value then the output will be completely readonly
169 (deeply).
170
171 set_readonly_scalars
172
173 If set to a true value then scalars in the output will be readonly
174 (deeply). References won't be readonly.
175
177 decode
178 Given a byte string of Sereal data, the "decode" call deserializes that
179 data structure. The result can be obtained in one of two ways: "decode"
180 accepts a second parameter, which is a scalar to write the result to,
181 AND "decode" will return the resulting data structure.
182
183 The two are subtly different in case of data structures that contain
184 references to the root element. In that case, the return value will be
185 a (non-recursive) copy of the reference. The pass-in style is more
186 correct. In other words,
187
188 $decoder->decode($sereal_string, my $out);
189 # is almost the same but safer than:
190 my $out = $decoder->decode($sereal_string);
191
192 This is an unfortunate side-effect of perls standard copy semantics of
193 assignment. Possibly one day we will have an alternative to this.
194
195 decode_with_header
196 Given a byte string of Sereal data, the "decode_with_header" call
197 deserializes that data structure as "decode" would do, however it also
198 decodes the optional user data structure that can be embedded into a
199 Sereal document, inside the header (see Sereal::Encoder::encode).
200
201 It accepts an optional second parameter, which is a scalar to write the
202 body to, and an optional third parameter, which is a scalar to write
203 the header to.
204
205 Regardless of the number of parameters received, "decode_with_header"
206 returns an ArrayRef containing the deserialized header, and the
207 deserialized body, in this order.
208
209 See "decode" for the subtle difference between the one, two and three
210 parameters versions.
211
212 If there is no header in a Sereal document, corresponding variable or
213 return value will be set to undef.
214
215 decode_only_header
216 Given a byte string of Sereal data, the "decode_only_header"
217 deserializes only the optional user data structure that can be embedded
218 into a Sereal document, inside the header (see
219 Sereal::Encoder::encode).
220
221 It accepts an optional second parameter, which is a scalar to write the
222 header to.
223
224 Regardless of the number of parameters received, "decode_only_header"
225 returns the resulting data structure.
226
227 See "decode" for the subtle difference between the one and two
228 parameters versions.
229
230 If there is no header in a Sereal document, corresponding variable or
231 return value will be set to undef.
232
233 decode_with_offset
234 Same as the "decode" method, except as second parameter, you must pass
235 an integer offset into the input string, at which the decoding is to
236 start. The optional "pass-in" style scalar (see "decode" above) is
237 relegated to being the third parameter.
238
239 decode_only_header_with_offset
240 Same as the "decode_only_header" method, except as second parameter,
241 you must pass an integer offset into the input string, at which the
242 decoding is to start. The optional "pass-in" style scalar (see
243 "decode_only_header" above) is relegated to being the third parameter.
244
245 decode_with_header_and_offset
246 Same as the "decode_with_header" method, except as second parameter,
247 you must pass an integer offset into the input string, at which the
248 decoding is to start. The optional "pass-in" style scalars (see
249 "decode_with_header" above) are relegated to being the third and fourth
250 parameters.
251
252 bytes_consumed
253 After using the various "decode" methods documented previously,
254 "bytes_consumed" can return the number of bytes from the body of the
255 input string that were actually consumed by the decoder. That is, if
256 you append random garbage to a valid Sereal document, "decode" will
257 happily decode the data and ignore the garbage. If that is an error in
258 your use case, you can use "bytes_consumed" to catch it.
259
260 my $out = $decoder->decode($sereal_string);
261 if (length($sereal_string) != $decoder->bytes_consumed) {
262 die "Not all input data was consumed!";
263 }
264
265 Chances are that if you do this, you're violating UNIX philosophy in
266 "be strict in what you emit but lenient in what you accept".
267
268 You can also use this to deserialize a list of Sereal documents that is
269 concatenated into the same string (code not very robust...):
270
271 my @out;
272 my $pos = 0;
273 eval {
274 while (1) {
275 push @out, $decoder->decode_with_offset($sereal_string, $pos);
276 $pos += $decoder->bytes_consumed;
277 last if $pos >= length($sereal_string)
278 or not $decoder->bytes_consumed;
279 }
280 };
281
282 As mentioned, only the bytes consumed from the body are considered. So
283 the following example is correct, as only the header is deserialized:
284
285 my $header = $decoder->decode_only_header($sereal_string);
286 my $count = $decoder->bytes_consumed;
287 # $count is 0
288
289 decode_from_file
290 Sereal::Decoder->decode_from_file($file);
291 $decoder->decode_from_file($file);
292
293 Read and decode the file specified. If called in list context and
294 incremental mode is enabled then decodes all packets contained in the
295 file and returns a list, otherwise decodes the first (or only) packet
296 in the file. Accepts an optinal "target" variable as a second argument.
297
298 looks_like_sereal
299 Performs some rudimentary check to determine if the argument appears to
300 be a valid Sereal packet or not. These tests are not comprehensive and
301 a true result does not mean that the document is valid, merely that it
302 appears to be valid. On the other hand a false result is always
303 reliable.
304
305 The return of this method may be treated as a simple boolean but is in
306 fact a more complex return. When the argument does not look anything
307 like a Sereal document then the return is perl's FALSE, which has the
308 property of being string equivalent to "" and numerically equivalent to
309 0. However when the argument appears to be a UTF-8 encoded protocol 3
310 Sereal document (by noticing that the \xF3 in the magic string has been
311 replaced by \xC3\xB3) then it returns 0 (the number, which is string
312 equivalent to "0"), and otherwise returns the protocol version of the
313 document. This means you can write something like this:
314
315 $type= Sereal::Decoder->looks_like_sereal($thing);
316 if ($type eq '') {
317 say "Not a Sereal document";
318 } elsif ($type eq '0') {
319 say "Possibly utf8 encoded Sereal document";
320 } else {
321 say "Sereal document version $type";
322 }
323
324 For reference, Sereal's magic value is a four byte string which is
325 either "=srl" for protocol version 1 and 2 or "=\xF3rl" for protocol
326 version 3 and later. This function checks that the magic string
327 corresponds with the reported version number, as well as other checks,
328 which may be enhanced in the future.
329
330 Note that looks_like_sereal() may be called as a class or object
331 method, and may also be called as a single argument function. See the
332 related scalar_looks_like_sereal() for a version which may ONLY be
333 called as a function, not as a method (and which is typically much
334 faster).
335
337 sereal_decode_with_object
338 The functional interface that is equivalent to using "decode". Takes a
339 decoder object reference as first parameter, followed by a byte string
340 to deserialize. Optionally takes a third parameter, which is the
341 output scalar to write to. See the documentation for "decode" above for
342 details.
343
344 This functional interface is marginally faster than the OO interface
345 since it avoids method resolution overhead and, on sufficiently modern
346 Perl versions, can usually avoid subroutine call overhead. See
347 Sereal::Performance for a discussion on how to tune Sereal for maximum
348 performance if you need to.
349
350 sereal_decode_with_header_with_object
351 The functional interface that is equivalent to using
352 "decode_with_header". Takes a decoder object reference as first
353 parameter, followed by a byte string to deserialize. Optionally takes
354 third and fourth parameters, which are the output scalars to write to.
355 See the documentation for "decode_with_header" above for details.
356
357 This functional interface is marginally faster than the OO interface
358 since it avoids method resolution overhead and, on sufficiently modern
359 Perl versions, can usually avoid subroutine call overhead. See
360 Sereal::Performance for a discussion on how to tune Sereal for maximum
361 performance if you need to.
362
363 sereal_decode_only_header_with_object
364 The functional interface that is equivalent to using
365 "decode_only_header". Takes a decoder object reference as first
366 parameter, followed by a byte string to deserialize. Optionally takes a
367 third parameters, which outputs scalars to write to. See the
368 documentation for "decode_with_header" above for details.
369
370 This functional interface is marginally faster than the OO interface
371 since it avoids method resolution overhead and, on sufficiently modern
372 Perl versions, can usually avoid subroutine call overhead. See
373 Sereal::Performance for a discussion on how to tune Sereal for maximum
374 performance if you need to.
375
376 sereal_decode_only_header_with_offset_with_object
377 The functional interface that is equivalent to using
378 "decode_only_header_with_offset". Same as the
379 "sereal_decode_only_header_with_object" function, except as the third
380 parameter, you must pass an integer offset into the input string, at
381 which the decoding is to start. The optional "pass-in" style scalar
382 (see "sereal_decode_only_header_with_object" above) is relegated to
383 being the fourth parameter.
384
385 This functional interface is marginally faster than the OO interface
386 since it avoids method resolution overhead and, on sufficiently modern
387 Perl versions, can usually avoid subroutine call overhead. See
388 Sereal::Performance for a discussion on how to tune Sereal for maximum
389 performance if you need to.
390
391 sereal_decode_with_header_and_offset_with_object
392 The functional interface that is equivalent to using
393 "decode_with_header_and_offset". Same as the
394 "sereal_decode_with_header_with_object" function, except as the third
395 parameter, you must pass an integer offset into the input string, at
396 which the decoding is to start. The optional "pass-in" style scalars
397 (see "sereal_decode_with_header_with_object" above) are relegated to
398 being the fourth and fifth parameters.
399
400 This functional interface is marginally faster than the OO interface
401 since it avoids method resolution overhead and, on sufficiently modern
402 Perl versions, can usually avoid subroutine call overhead. See
403 Sereal::Performance for a discussion on how to tune Sereal for maximum
404 performance if you need to.
405
406 sereal_decode_with_offset_with_object
407 The functional interface that is equivalent to using
408 "decode_with_offset". Same as the "sereal_decode_with_object"
409 function, except as the third parameter, you must pass an integer
410 offset into the input string, at which the decoding is to start. The
411 optional "pass-in" style scalar (see "sereal_decode_with_object" above)
412 is relegated to being the third parameter.
413
414 This functional interface is marginally faster than the OO interface
415 since it avoids method resolution overhead and, on sufficiently modern
416 Perl versions, can usually avoid subroutine call overhead. See
417 Sereal::Performance for a discussion on how to tune Sereal for maximum
418 performance if you need to.
419
420 decode_sereal
421 The functional interface that is equivalent to using "new" and
422 "decode". Expects a byte string to deserialize as first argument,
423 optionally followed by a hash reference of options (see documentation
424 for "new()"). Finally, "decode_sereal" supports a third parameter,
425 which is the output scalar to write to. See the documentation for
426 "decode" above for details.
427
428 This functional interface is significantly slower than the OO interface
429 since it cannot reuse the decoder object.
430
431 decode_sereal_with_header_data
432 The functional interface that is equivalent to using "new" and
433 "decode_with_header". Expects a byte string to deserialize as first
434 argument, optionally followed by a hash reference of options (see
435 documentation for "new()"). Finally, "decode_sereal" supports third and
436 fourth parameters, which are the output scalars to write to. See the
437 documentation for "decode_with_header" above for details.
438
439 This functional interface is significantly slower than the OO interface
440 since it cannot reuse the decoder object.
441
442 scalar_looks_like_sereal
443 The functional interface that is equivalent to using
444 "looks_like_sereal".
445
446 Note that this version cannot be called as a method. It is normally
447 executed as a custom opcode, as such errors about its usage may be
448 caught at compile time, and it should be much faster than
449 looks_like_sereal.
450
452 This implementation of a Sereal decoder tries to be as robust to
453 invalid input data as reasonably possible. This means that it should
454 never (though read on) segfault. It may, however, cause a large malloc
455 to fail. Generally speaking, invalid data should cause a Perl-trappable
456 exception. The one exception is that for Snappy-compressed Sereal
457 documents, the Snappy library may cause segmentation faults (invalid
458 reads or writes). This should only be a problem if you do not checksum
459 your data (internal checksum support is a To-Do) or if you accept data
460 from potentially malicious sources.
461
462 It requires a lot of run-time boundary checks to prevent decoder
463 segmentation faults on invalid data. We implemented them in the
464 lightest way possible. Adding robustness against running out of memory
465 would cause an very significant run-time overhead. In most cases of
466 random garbage (with valid header no less) when a malloc() fails due to
467 invalid data, the problem was caused by a very large array or string
468 length. This kind of very large malloc can then fail, being trappable
469 from Perl. Only when packet causes many repeated allocations do you
470 risk causing a hard OOM error from the kernel that cannot be trapped
471 because Perl may require some small allocations to succeed before the
472 now-invalid memory is released. It is at least not entirely trivial to
473 craft a Sereal document that causes this behaviour.
474
475 Finally, deserializing proper objects is potentially a problem because
476 classes can define a destructor. Thus, the data fed to the decoder can
477 cause the (deferred) execution of any destructor in your application.
478 That's why the "refuse_objects" option exists and what the
479 "no_bless_objects" can be used for as well. Later on, we may or may not
480 provide a facility to whitelist classes. Furthermore, if the encoder
481 emitted any objects using "FREEZE" callbacks, the "THAW" class method
482 may be invoked on the respective classes. If you can't trust the source
483 of your Sereal documents, you may want to use the "refuse_objects"
484 option. For more details on the "FREEZE/THAW" mechanism, please refer
485 to Sereal::Encoder.
486
488 Please refer to the Sereal::Performance document that has more detailed
489 information about Sereal performance and tuning thereof.
490
492 "Sereal::Decoder" is thread-safe on Perl's 5.8.7 and higher. This means
493 "thread-safe" in the sense that if you create a new thread, all
494 "Sereal::Decoder" objects will become a reference to undef in the new
495 thread. This might change in a future release to become a full clone of
496 the decoder object.
497
499 For reporting bugs, please use the github bug tracker at
500 <http://github.com/Sereal/Sereal/issues>.
501
502 For support and discussion of Sereal, there are two Google Groups:
503
504 Announcements around Sereal (extremely low volume):
505 <https://groups.google.com/forum/?fromgroups#!forum/sereal-announce>
506
507 Sereal development list:
508 <https://groups.google.com/forum/?fromgroups#!forum/sereal-dev>
509
511 Yves Orton <demerphq@gmail.com>
512
513 Damian Gryski
514
515 Steffen Mueller <smueller@cpan.org>
516
517 Rafaël Garcia-Suarez
518
519 Ævar Arnfjörð Bjarmason <avar@cpan.org>
520
521 Tim Bunce
522
523 Daniel Dragan <bulkdd@cpan.org> (Windows support and bugfixes)
524
525 Zefram
526
527 Borislav Nikolov
528
529 Ivan Kruglov <ivan.kruglov@yahoo.com>
530
531 Eric Herman <eric@freesa.org>
532
533 Some inspiration and code was taken from Marc Lehmann's excellent
534 JSON::XS module due to obvious overlap in problem domain.
535
537 This module was originally developed for Booking.com. With approval
538 from Booking.com, this module was generalized and published on CPAN,
539 for which the authors would like to express their gratitude.
540
542 Copyright (C) 2012, 2013, 2014 by Steffen Mueller Copyright (C) 2012,
543 2013, 2014 by Yves Orton
544
545 The license for the code in this distribution is the following, with
546 the exceptions listed below:
547
548 This library is free software; you can redistribute it and/or modify it
549 under the same terms as Perl itself.
550
551 Except portions taken from Marc Lehmann's code for the JSON::XS module,
552 which is licensed under the same terms as this module. (Many thanks to
553 Marc for inspiration, and code.)
554
555 Also except the code for Snappy compression library, whose license is
556 reproduced below and which, to the best of our knowledge, is compatible
557 with this module's license. The license for the enclosed Snappy code
558 is:
559
560 Copyright 2011, Google Inc.
561 All rights reserved.
562
563 Redistribution and use in source and binary forms, with or without
564 modification, are permitted provided that the following conditions are
565 met:
566
567 * Redistributions of source code must retain the above copyright
568 notice, this list of conditions and the following disclaimer.
569 * Redistributions in binary form must reproduce the above
570 copyright notice, this list of conditions and the following disclaimer
571 in the documentation and/or other materials provided with the
572 distribution.
573 * Neither the name of Google Inc. nor the names of its
574 contributors may be used to endorse or promote products derived from
575 this software without specific prior written permission.
576
577 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
578 "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
579 LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
580 A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
581 OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
582 SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
583 LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
584 DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
585 THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
586 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
587 OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
588
589
590
591perl v5.30.1 2020-02-04 Sereal::Decoder(3)