1Sereal::Decoder(3)    User Contributed Perl Documentation   Sereal::Decoder(3)
2
3
4

NAME

6       Sereal::Decoder - Fast, compact, powerful binary deserialization
7

SYNOPSIS

9         use Sereal::Decoder
10           qw(decode_sereal sereal_decode_with_object scalar_looks_like_sereal);
11
12         my $decoder = Sereal::Decoder->new({...options...});
13
14         my $structure;
15         $decoder->decode($blob, $structure); # deserializes into $structure
16
17         # or if you don't have references to the top level structure, this works, too:
18         $structure = $decoder->decode($blob);
19
20         # alternatively functional interface: (See Sereal::Performance)
21         sereal_decode_with_object($decoder, $blob, $structure);
22         $structure = sereal_decode_with_object($decoder, $blob);
23
24         # much slower functional interface with no persistent objects:
25         decode_sereal($blob, {... options ...}, $structure);
26         $structure = decode_sereal($blob, {... options ...});
27
28         # Not a full validation, but just a quick check for a reasonable header:
29         my $is_likely_sereal = scalar_looks_like_sereal($some_string);
30         # or:
31         $is_likely_sereal = $decoder->looks_like_sereal($some_string);
32

DESCRIPTION

34       This library implements a deserializer for an efficient, compact-
35       output, and feature-rich binary protocol called Sereal.  Its sister
36       module Sereal::Encoder implements an encoder for this format.  The two
37       are released separately to allow for independent and safer upgrading.
38
39       The Sereal protocol versions that are compatible with this decoder
40       implementation are currently protocol versions 1, 2, 3 and 4. As it
41       stands, it will refuse to attempt to decode future versions of the
42       protocol, but if necessary there is likely going to be an option to
43       decode the parts of the input that are compatible with version 4 of the
44       protocol. The protocol was designed to allow for this.
45
46       The protocol specification and many other bits of documentation can be
47       found in the github repository. Right now, the specification is at
48       <https://github.com/Sereal/Sereal/blob/master/sereal_spec.pod>, there
49       is a discussion of the design objectives in
50       <https://github.com/Sereal/Sereal/blob/master/README.pod>, and the
51       output of our benchmarks can be seen at
52       <https://github.com/Sereal/Sereal/wiki/Sereal-Comparison-Graphs>.
53

CLASS METHODS

55   new
56       Constructor. Optionally takes a hash reference as first parameter. This
57       hash reference may contain any number of options that influence the
58       behaviour of the encoder.
59
60       Currently, the following options are recognized, none of them are on by
61       default.
62
63       refuse_snappy
64
65       If set, the decoder will refuse Snappy-compressed input data. This can
66       be desirable for robustness. See the section "ROBUSTNESS" below.
67
68       refuse_objects
69
70       If set, the decoder will refuse deserializing any objects in the input
71       stream and instead throw an exception. Defaults to off. See the section
72       "ROBUSTNESS" below.
73
74       no_bless_objects
75
76       If set, the decoder will deserialize any objects in the input stream
77       but without blessing them. Defaults to off. See the section
78       "ROBUSTNESS" below.
79
80       validate_utf8
81
82       If set, the decoder will refuse invalid UTF-8 byte sequences. This is
83       off by default, but it's strongly encouraged to be turned on if you're
84       dealing with any data that has been encoded by an external source (e.g.
85       http cookies).
86
87       max_recursion_depth
88
89       "Sereal::Decoder" is recursive. If you pass it a Sereal document that
90       is deeply nested, it will eventually exhaust the C stack. Therefore,
91       there is a limit on the depth of recursion that is accepted. It
92       defaults to 10000 nested calls. You may choose to override this value
93       with the "max_recursion_depth" option.  Beware that setting it too high
94       can cause hard crashes.
95
96       Do note that the setting is somewhat approximate. Setting it to 10000
97       may break at somewhere between 9997 and 10003 nested structures
98       depending on their types.
99
100       max_num_hash_entries
101
102       If set to a non-zero value (default: 0), then "Sereal::Decoder" will
103       refuse to deserialize any hash/dictionary (or hash-based object) with
104       more than that number of entries. This is to be able to respond quickly
105       to any future hash-collision attacks on Perl's hash function, and also
106       the memory exhaustion attacks on Sereal itself. For a gentle
107       introduction to the topic from the cryptographic point of view, see
108       <http://en.wikipedia.org/wiki/Collision_attack>.
109
110       max_num_array_entries
111
112       If set to a non-zero value (default: 0), then "Sereal::Decoder" will
113       refuse to deserialize any array with more than that number of entries.
114       This is to be able to respond quickly to any future memory exhaustion
115       attacks on Sereal.
116
117       max_string_length
118
119       If set to a non-zero value (default: 0), then "Sereal::Decoder" will
120       refuse to deserialize any string with more than that number of
121       characters.  This is to be able to respond quickly to any future memory
122       exhaustion attacks on Sereal.
123
124       max_uncompressed_size
125
126       If set to a non-zero value (default: 0), then "Sereal::Decoder" will
127       refuse to deserialize any blob with a size that exceds the value when
128       uncompressed.  This is to be able to respond quickly to any future
129       memory exhaustion attacks on Sereal.
130
131       incremental
132
133       If set to a non-zero value (default: 0), then "Sereal::Decoder" will
134       destructively parse Sereal documents out of a variable. Every time a
135       Sereal document is successfully parsed it is removed from the front of
136       the string it is parsed from.
137
138       This means you can do this:
139
140           while (length $buffer) {
141               my $data= decode_sereal($buffer,{incremental=>1});
142           }
143
144       alias_smallint
145
146       If set to a true value then "Sereal::Decoder" will share integers from
147       -16 to 15 (encoded as either SRL_HDR_NEG and SRL_HDR_POS) as read-only
148       aliases to a common SV.
149
150       The result of this may be significant space savings in data structures
151       with many integers in the specified range. The cost is more memory used
152       by the decoder and a very modest speed penalty when deserializing.
153
154       Note this option changes the structure of the dumped data. Use with
155       caution.
156
157       See also the "alias_varint_under" option.
158
159       alias_varint_under
160
161       If set to a true positive integer smaller than 16 then this option is
162       similar to setting "alias_smallint" and causes all integers from -16 to
163       15 to be shared as read-only aliases to the same SV, except that this
164       treatment ALSO applies to SRL_HDR_VARINT. If set to a value larger than
165       16 then this applies to all varints varints under the value set. (In
166       general SRL_HDR_VARINT is used only for integers larger than 15, and
167       SRL_HDR_NEG and SRL_HDR_POS are used for -16 to -1  and 0 to 15
168       respectively.)
169
170       In simple terms if you want to share values larger than 16 then you
171       should use this option, if you want to share only values in the -16 to
172       15 range then you should use the "alias_smallint" option instead.
173
174       The result of this may be significant space savings in data structures
175       with many integers in the desire range. The cost is more memory used by
176       the decoder and a very modest speed penalty when deserializing.
177
178       Note this option changes the structure of the dumped data. Use with
179       caution.
180
181       use_undef
182
183       If set to a true value then this any undef value to be deserialized as
184       PL_sv_undef. This may change the structure of the data structure being
185       dumped, do not enable this unless you know what you are doing.
186
187       set_readonly
188
189       If set to a true value then the output will be completely readonly
190       (deeply).
191
192       set_readonly_scalars
193
194       If set to a true value then scalars in the output will be readonly
195       (deeply).  References won't be readonly.
196

INSTANCE METHODS

198   decode
199       Given a byte string of Sereal data, the "decode" call deserializes that
200       data structure. The result can be obtained in one of two ways: "decode"
201       accepts a second parameter, which is a scalar to write the result to,
202       AND "decode" will return the resulting data structure.
203
204       The two are subtly different in case of data structures that contain
205       references to the root element. In that case, the return value will be
206       a (non-recursive) copy of the reference. The pass-in style is more
207       correct.  In other words,
208
209         $decoder->decode($sereal_string, my $out);
210         # is almost the same but safer than:
211         my $out = $decoder->decode($sereal_string);
212
213       This is an unfortunate side-effect of perls standard copy semantics of
214       assignment. Possibly one day we will have an alternative to this.
215
216   decode_with_header
217       Given a byte string of Sereal data, the "decode_with_header" call
218       deserializes that data structure as "decode" would do, however it also
219       decodes the optional user data structure that can be embedded into a
220       Sereal document, inside the header  (see Sereal::Encoder::encode).
221
222       It accepts an optional second parameter, which is a scalar to write the
223       body to, and an optional third parameter, which is a scalar to write
224       the header to.
225
226       Regardless of the number of parameters received, "decode_with_header"
227       returns an ArrayRef containing the deserialized header, and the
228       deserialized body, in this order.
229
230       See "decode" for the subtle difference between the one, two and three
231       parameters versions.
232
233       If there is no header in a Sereal document, corresponding variable or
234       return value will be set to undef.
235
236   decode_only_header
237       Given a byte string of Sereal data, the "decode_only_header"
238       deserializes only the optional user data structure that can be embedded
239       into a Sereal document, inside the header (see
240       Sereal::Encoder::encode).
241
242       It accepts an optional second parameter, which is a scalar to write the
243       header to.
244
245       Regardless of the number of parameters received, "decode_only_header"
246       returns the resulting data structure.
247
248       See "decode" for the subtle difference between the one and two
249       parameters versions.
250
251       If there is no header in a Sereal document, corresponding variable or
252       return value will be set to undef.
253
254   decode_with_offset
255       Same as the "decode" method, except as second parameter, you must pass
256       an integer offset into the input string, at which the decoding is to
257       start. The optional "pass-in" style scalar (see "decode" above) is
258       relegated to being the third parameter.
259
260   decode_only_header_with_offset
261       Same as the "decode_only_header" method, except as second parameter,
262       you must pass an integer offset into the input string, at which the
263       decoding is to start. The optional "pass-in" style scalar (see
264       "decode_only_header" above) is relegated to being the third parameter.
265
266   decode_with_header_and_offset
267       Same as the "decode_with_header" method, except as second parameter,
268       you must pass an integer offset into the input string, at which the
269       decoding is to start. The optional "pass-in" style scalars (see
270       "decode_with_header" above) are relegated to being the third and fourth
271       parameters.
272
273   bytes_consumed
274       After using the various "decode" methods documented previously,
275       "bytes_consumed" can return the number of bytes from the body of the
276       input string that were actually consumed by the decoder. That is, if
277       you append random garbage to a valid Sereal document, "decode" will
278       happily decode the data and ignore the garbage. If that is an error in
279       your use case, you can use "bytes_consumed" to catch it.
280
281         my $out = $decoder->decode($sereal_string);
282         if (length($sereal_string) != $decoder->bytes_consumed) {
283           die "Not all input data was consumed!";
284         }
285
286       Chances are that if you do this, you're violating UNIX philosophy in
287       "be strict in what you emit but lenient in what you accept".
288
289       You can also use this to deserialize a list of Sereal documents that is
290       concatenated into the same string (code not very robust...):
291
292         my @out;
293         my $pos = 0;
294         eval {
295           while (1) {
296             push @out, $decoder->decode_with_offset($sereal_string, $pos);
297             $pos += $decoder->bytes_consumed;
298             last if $pos >= length($sereal_string)
299                  or not $decoder->bytes_consumed;
300           }
301         };
302
303       As mentioned, only the bytes consumed from the body are considered. So
304       the following example is correct, as only the header is deserialized:
305
306         my $header = $decoder->decode_only_header($sereal_string);
307         my $count = $decoder->bytes_consumed;
308         # $count is 0
309
310   decode_from_file
311           Sereal::Decoder->decode_from_file($file);
312           $decoder->decode_from_file($file);
313
314       Read and decode the file specified. If called in list context and
315       incremental mode is enabled then decodes all packets contained in the
316       file and returns a list, otherwise decodes the first (or only) packet
317       in the file. Accepts an optinal "target" variable as a second argument.
318
319   looks_like_sereal
320       Performs some rudimentary check to determine if the argument appears to
321       be a valid Sereal packet or not. These tests are not comprehensive and
322       a true result does not mean that the document is valid, merely that it
323       appears to be valid. On the other hand a false result is always
324       reliable.
325
326       The return of this method may be treated as a simple boolean but is in
327       fact a more complex return. When the argument does not look anything
328       like a Sereal document then the return is perl's FALSE, which has the
329       property of being string equivalent to "" and numerically equivalent to
330       0. However when the argument appears to be a UTF-8 encoded protocol 3
331       Sereal document (by noticing that the \xF3 in the magic string has been
332       replaced by \xC3\xB3) then it returns 0 (the number, which is string
333       equivalent to "0"), and otherwise returns the protocol version of the
334       document. This means you can write something like this:
335
336           $type= Sereal::Decoder->looks_like_sereal($thing);
337           if ($type eq '') {
338               say "Not a Sereal document";
339           } elsif ($type eq '0') {
340               say "Possibly utf8 encoded Sereal document";
341           } else {
342               say "Sereal document version $type";
343           }
344
345       For reference, Sereal's magic value is a four byte string which is
346       either "=srl" for protocol version 1 and 2 or "=\xF3rl" for protocol
347       version 3 and later. This function checks that the magic string
348       corresponds with the reported version number, as well as other checks,
349       which may be enhanced in the future.
350
351       Note that looks_like_sereal() may be called as a class or object
352       method, and may also be called as a single argument function. See the
353       related scalar_looks_like_sereal() for a version which may ONLY be
354       called as a function, not as a method (and which is typically much
355       faster).
356

EXPORTABLE FUNCTIONS

358   sereal_decode_with_object
359       The functional interface that is equivalent to using "decode". Takes a
360       decoder object reference as first parameter, followed by a byte string
361       to deserialize.  Optionally takes a third parameter, which is the
362       output scalar to write to. See the documentation for "decode" above for
363       details.
364
365       This functional interface is marginally faster than the OO interface
366       since it avoids method resolution overhead and, on sufficiently modern
367       Perl versions, can usually avoid subroutine call overhead. See
368       Sereal::Performance for a discussion on how to tune Sereal for maximum
369       performance if you need to.
370
371   sereal_decode_with_header_with_object
372       The functional interface that is equivalent to using
373       "decode_with_header".  Takes a decoder object reference as first
374       parameter, followed by a byte string to deserialize. Optionally takes
375       third and fourth parameters, which are the output scalars to write to.
376       See the documentation for "decode_with_header" above for details.
377
378       This functional interface is marginally faster than the OO interface
379       since it avoids method resolution overhead and, on sufficiently modern
380       Perl versions, can usually avoid subroutine call overhead. See
381       Sereal::Performance for a discussion on how to tune Sereal for maximum
382       performance if you need to.
383
384   sereal_decode_only_header_with_object
385       The functional interface that is equivalent to using
386       "decode_only_header".  Takes a decoder object reference as first
387       parameter, followed by a byte string to deserialize. Optionally takes a
388       third parameters, which outputs scalars to write to.  See the
389       documentation for "decode_with_header" above for details.
390
391       This functional interface is marginally faster than the OO interface
392       since it avoids method resolution overhead and, on sufficiently modern
393       Perl versions, can usually avoid subroutine call overhead. See
394       Sereal::Performance for a discussion on how to tune Sereal for maximum
395       performance if you need to.
396
397   sereal_decode_only_header_with_offset_with_object
398       The functional interface that is equivalent to using
399       "decode_only_header_with_offset".  Same as the
400       "sereal_decode_only_header_with_object" function, except as the third
401       parameter, you must pass an integer offset into the input string, at
402       which the decoding is to start. The optional "pass-in" style scalar
403       (see "sereal_decode_only_header_with_object" above) is relegated to
404       being the fourth parameter.
405
406       This functional interface is marginally faster than the OO interface
407       since it avoids method resolution overhead and, on sufficiently modern
408       Perl versions, can usually avoid subroutine call overhead. See
409       Sereal::Performance for a discussion on how to tune Sereal for maximum
410       performance if you need to.
411
412   sereal_decode_with_header_and_offset_with_object
413       The functional interface that is equivalent to using
414       "decode_with_header_and_offset".  Same as the
415       "sereal_decode_with_header_with_object" function, except as the third
416       parameter, you must pass an integer offset into the input string, at
417       which the decoding is to start. The optional "pass-in" style scalars
418       (see "sereal_decode_with_header_with_object" above) are relegated to
419       being the fourth and fifth parameters.
420
421       This functional interface is marginally faster than the OO interface
422       since it avoids method resolution overhead and, on sufficiently modern
423       Perl versions, can usually avoid subroutine call overhead. See
424       Sereal::Performance for a discussion on how to tune Sereal for maximum
425       performance if you need to.
426
427   sereal_decode_with_offset_with_object
428       The functional interface that is equivalent to using
429       "decode_with_offset".  Same as the "sereal_decode_with_object"
430       function, except as the third parameter, you must pass an integer
431       offset into the input string, at which the decoding is to start. The
432       optional "pass-in" style scalar (see "sereal_decode_with_object" above)
433       is relegated to being the third parameter.
434
435       This functional interface is marginally faster than the OO interface
436       since it avoids method resolution overhead and, on sufficiently modern
437       Perl versions, can usually avoid subroutine call overhead. See
438       Sereal::Performance for a discussion on how to tune Sereal for maximum
439       performance if you need to.
440
441   decode_sereal
442       The functional interface that is equivalent to using "new" and
443       "decode".  Expects a byte string to deserialize as first argument,
444       optionally followed by a hash reference of options (see documentation
445       for "new()"). Finally, "decode_sereal" supports a third parameter,
446       which is the output scalar to write to. See the documentation for
447       "decode" above for details.
448
449       This functional interface is significantly slower than the OO interface
450       since it cannot reuse the decoder object.
451
452   decode_sereal_with_header_data
453       The functional interface that is equivalent to using "new" and
454       "decode_with_header".  Expects a byte string to deserialize as first
455       argument, optionally followed by a hash reference of options (see
456       documentation for "new()"). Finally, "decode_sereal" supports third and
457       fourth parameters, which are the output scalars to write to. See the
458       documentation for "decode_with_header" above for details.
459
460       This functional interface is significantly slower than the OO interface
461       since it cannot reuse the decoder object.
462
463   scalar_looks_like_sereal
464       The functional interface that is equivalent to using
465       "looks_like_sereal".
466
467       Note that this version cannot be called as a method. It is normally
468       executed as a custom opcode, as such errors about its usage may be
469       caught at compile time, and it should be much faster than
470       looks_like_sereal.
471

ROBUSTNESS

473       This implementation of a Sereal decoder tries to be as robust to
474       invalid input data as reasonably possible. This means that it should
475       never (though read on) segfault. It may, however, cause a large malloc
476       to fail. Generally speaking, invalid data should cause a Perl-trappable
477       exception. The one exception is that for Snappy-compressed Sereal
478       documents, the Snappy library may cause segmentation faults (invalid
479       reads or writes).  This should only be a problem if you do not checksum
480       your data (internal checksum support is a To-Do) or if you accept data
481       from potentially malicious sources.
482
483       It requires a lot of run-time boundary checks to prevent decoder
484       segmentation faults on invalid data. We implemented them in the
485       lightest way possible. Adding robustness against running out of memory
486       would cause an very significant run-time overhead. In most cases of
487       random garbage (with valid header no less) when a malloc() fails due to
488       invalid data, the problem was caused by a very large array or string
489       length. This kind of very large malloc can then fail, being trappable
490       from Perl. Only when packet causes many repeated allocations do you
491       risk causing a hard OOM error from the kernel that cannot be trapped
492       because Perl may require some small allocations to succeed before the
493       now-invalid memory is released. It is at least not entirely trivial to
494       craft a Sereal document that causes this behaviour.
495
496       Finally, deserializing proper objects is potentially a problem because
497       classes can define a destructor. Thus, the data fed to the decoder can
498       cause the (deferred) execution of any destructor in your application.
499       That's why the "refuse_objects" option exists and what the
500       "no_bless_objects" can be used for as well. Later on, we may or may not
501       provide a facility to whitelist classes. Furthermore, if the encoder
502       emitted any objects using "FREEZE" callbacks, the "THAW" class method
503       may be invoked on the respective classes. If you can't trust the source
504       of your Sereal documents, you may want to use the "refuse_objects"
505       option. For more details on the "FREEZE/THAW" mechanism, please refer
506       to Sereal::Encoder.
507

PERFORMANCE

509       Please refer to the Sereal::Performance document that has more detailed
510       information about Sereal performance and tuning thereof.
511

THREAD-SAFETY

513       "Sereal::Decoder" is thread-safe on Perl's 5.8.7 and higher. This means
514       "thread-safe" in the sense that if you create a new thread, all
515       "Sereal::Decoder" objects will become a reference to undef in the new
516       thread. This might change in a future release to become a full clone of
517       the decoder object.
518

BUGS, CONTACT AND SUPPORT

520       For reporting bugs, please use the github bug tracker at
521       <http://github.com/Sereal/Sereal/issues>.
522
523       For support and discussion of Sereal, there are two Google Groups:
524
525       Announcements around Sereal (extremely low volume):
526       <https://groups.google.com/forum/?fromgroups#!forum/sereal-announce>
527
528       Sereal development list:
529       <https://groups.google.com/forum/?fromgroups#!forum/sereal-dev>
530

AUTHORS AND CONTRIBUTORS

532       Yves Orton <demerphq@gmail.com>
533
534       Damian Gryski
535
536       Steffen Mueller <smueller@cpan.org>
537
538       Rafaël Garcia-Suarez
539
540       Ævar Arnfjörð Bjarmason <avar@cpan.org>
541
542       Tim Bunce
543
544       Daniel Dragan <bulkdd@cpan.org> (Windows support and bugfixes)
545
546       Zefram
547
548       Borislav Nikolov
549
550       Ivan Kruglov <ivan.kruglov@yahoo.com>
551
552       Eric Herman <eric@freesa.org>
553
554       Some inspiration and code was taken from Marc Lehmann's excellent
555       JSON::XS module due to obvious overlap in problem domain.
556

ACKNOWLEDGMENT

558       This module was originally developed for Booking.com.  With approval
559       from Booking.com, this module was generalized and published on CPAN,
560       for which the authors would like to express their gratitude.
561
563       Copyright (C) 2012, 2013, 2014 by Steffen Mueller Copyright (C) 2012,
564       2013, 2014 by Yves Orton
565
566       The license for the code in this distribution is the following, with
567       the exceptions listed below:
568
569       This library is free software; you can redistribute it and/or modify it
570       under the same terms as Perl itself.
571
572       Except portions taken from Marc Lehmann's code for the JSON::XS module,
573       which is licensed under the same terms as this module.  (Many thanks to
574       Marc for inspiration, and code.)
575
576       Also except the code for Snappy compression library, whose license is
577       reproduced below and which, to the best of our knowledge, is compatible
578       with this module's license. The license for the enclosed Snappy code
579       is:
580
581         Copyright 2011, Google Inc.
582         All rights reserved.
583
584         Redistribution and use in source and binary forms, with or without
585         modification, are permitted provided that the following conditions are
586         met:
587
588           * Redistributions of source code must retain the above copyright
589         notice, this list of conditions and the following disclaimer.
590           * Redistributions in binary form must reproduce the above
591         copyright notice, this list of conditions and the following disclaimer
592         in the documentation and/or other materials provided with the
593         distribution.
594           * Neither the name of Google Inc. nor the names of its
595         contributors may be used to endorse or promote products derived from
596         this software without specific prior written permission.
597
598         THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
599         "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
600         LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
601         A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
602         OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
603         SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
604         LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
605         DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
606         THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
607         (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
608         OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
609
610
611
612perl v5.34.0                      2022-02-20                Sereal::Decoder(3)
Impressum