1Sereal::Decoder(3)    User Contributed Perl Documentation   Sereal::Decoder(3)
2
3
4

NAME

6       Sereal::Decoder - Fast, compact, powerful binary deserialization
7

SYNOPSIS

9         use Sereal::Decoder
10           qw(decode_sereal sereal_decode_with_object scalar_looks_like_sereal);
11
12         my $decoder = Sereal::Decoder->new({...options...});
13
14         my $structure;
15         $decoder->decode($blob, $structure); # deserializes into $structure
16
17         # or if you don't have references to the top level structure, this works, too:
18         $structure = $decoder->decode($blob);
19
20         # alternatively functional interface: (See Sereal::Performance)
21         sereal_decode_with_object($decoder, $blob, $structure);
22         $structure = sereal_decode_with_object($decoder, $blob);
23
24         # much slower functional interface with no persistent objects:
25         decode_sereal($blob, {... options ...}, $structure);
26         $structure = decode_sereal($blob, {... options ...});
27
28         # Not a full validation, but just a quick check for a reasonable header:
29         my $is_likely_sereal = scalar_looks_like_sereal($some_string);
30         # or:
31         $is_likely_sereal = $decoder->looks_like_sereal($some_string);
32

DESCRIPTION

34       This library implements a deserializer for an efficient, compact-
35       output, and feature-rich binary protocol called Sereal.  Its sister
36       module Sereal::Encoder implements an encoder for this format.  The two
37       are released separately to allow for independent and safer upgrading.
38
39       The Sereal protocol versions that are compatible with this decoder
40       implementation are currently protocol versions 1, 2, 3 and 4. As it
41       stands, it will refuse to attempt to decode future versions of the
42       protocol, but if necessary there is likely going to be an option to
43       decode the parts of the input that are compatible with version 4 of the
44       protocol. The protocol was designed to allow for this.
45
46       The protocol specification and many other bits of documentation can be
47       found in the github repository. Right now, the specification is at
48       <https://github.com/Sereal/Sereal/blob/master/sereal_spec.pod>, there
49       is a discussion of the design objectives in
50       <https://github.com/Sereal/Sereal/blob/master/README.pod>, and the
51       output of our benchmarks can be seen at
52       <https://github.com/Sereal/Sereal/wiki/Sereal-Comparison-Graphs>.
53

CLASS METHODS

55   new
56       Constructor. Optionally takes a hash reference as first parameter. This
57       hash reference may contain any number of options that influence the
58       behaviour of the encoder.
59
60       Currently, the following options are recognized, none of them are on by
61       default.
62
63       refuse_snappy
64
65       If set, the decoder will refuse Snappy-compressed input data. This can
66       be desirable for robustness. See the section "ROBUSTNESS" below.
67
68       refuse_objects
69
70       If set, the decoder will refuse deserializing any objects in the input
71       stream and instead throw an exception. Defaults to off. See the section
72       "ROBUSTNESS" below.
73
74       no_bless_objects
75
76       If set, the decoder will deserialize any objects in the input stream
77       but without blessing them. Defaults to off. See the section
78       "ROBUSTNESS" below.
79
80       validate_utf8
81
82       If set, the decoder will refuse invalid UTF-8 byte sequences. This is
83       off by default, but it's strongly encouraged to be turned on if you're
84       dealing with any data that has been encoded by an external source (e.g.
85       http cookies).
86
87       max_recursion_depth
88
89       "Sereal::Decoder" is recursive. If you pass it a Sereal document that
90       is deeply nested, it will eventually exhaust the C stack. Therefore,
91       there is a limit on the depth of recursion that is accepted. It
92       defaults to 10000 nested calls. You may choose to override this value
93       with the "max_recursion_depth" option.  Beware that setting it too high
94       can cause hard crashes.
95
96       Do note that the setting is somewhat approximate. Setting it to 10000
97       may break at somewhere between 9997 and 10003 nested structures
98       depending on their types.
99
100       max_num_hash_entries
101
102       If set to a non-zero value (default: 0), then "Sereal::Decoder" will
103       refuse to deserialize any hash/dictionary (or hash-based object) with
104       more than that number of entries. This is to be able to respond quickly
105       to any future hash-collision attacks on Perl's hash function. Chances
106       are, you don't want or need this. For a gentle introduction to the
107       topic from the cryptographic point of view, see
108       <http://en.wikipedia.org/wiki/Collision_attack>.
109
110       incremental
111
112       If set to a non-zero value (default: 0), then "Sereal::Decoder" will
113       destructively parse Sereal documents out of a variable. Every time a
114       Sereal document is successfully parsed it is removed from the front of
115       the string it is parsed from.
116
117       This means you can do this:
118
119           while (length $buffer) {
120               my $data= decode_sereal($buffer,{incremental=>1});
121           }
122
123       alias_smallint
124
125       If set to a true value then "Sereal::Decoder" will share integers from
126       -16 to 15 (encoded as either SRL_HDR_NEG and SRL_HDR_POS) as read-only
127       aliases to a common SV.
128
129       The result of this may be significant space savings in data structures
130       with many integers in the specified range. The cost is more memory used
131       by the decoder and a very modest speed penalty when deserializing.
132
133       Note this option changes the structure of the dumped data. Use with
134       caution.
135
136       See also the "alias_varint_under" option.
137
138       alias_varint_under
139
140       If set to a true positive integer smaller than 16 then this option is
141       similar to setting "alias_smallint" and causes all integers from -16 to
142       15 to be shared as read-only aliases to the same SV, except that this
143       treatment ALSO applies to SRL_HDR_VARINT. If set to a value larger than
144       16 then this applies to all varints varints under the value set. (In
145       general SRL_HDR_VARINT is used only for integers larger than 15, and
146       SRL_HDR_NEG and SRL_HDR_POS are used for -16 to -1  and 0 to 15
147       respectively.)
148
149       In simple terms if you want to share values larger than 16 then you
150       should use this option, if you want to share only values in the -16 to
151       15 range then you should use the "alias_smallint" option instead.
152
153       The result of this may be significant space savings in data structures
154       with many integers in the desire range. The cost is more memory used by
155       the decoder and a very modest speed penalty when deserializing.
156
157       Note this option changes the structure of the dumped data. Use with
158       caution.
159
160       use_undef
161
162       If set to a true value then this any undef value to be deserialized as
163       PL_sv_undef. This may change the structure of the data structure being
164       dumped, do not enable this unless you know what you are doing.
165
166       set_readonly
167
168       If set to a true value then the output will be completely readonly
169       (deeply).
170
171       set_readonly_scalars
172
173       If set to a true value then scalars in the output will be readonly
174       (deeply).  References won't be readonly.
175

INSTANCE METHODS

177   decode
178       Given a byte string of Sereal data, the "decode" call deserializes that
179       data structure. The result can be obtained in one of two ways: "decode"
180       accepts a second parameter, which is a scalar to write the result to,
181       AND "decode" will return the resulting data structure.
182
183       The two are subtly different in case of data structures that contain
184       references to the root element. In that case, the return value will be
185       a (non-recursive) copy of the reference. The pass-in style is more
186       correct.  In other words,
187
188         $decoder->decode($sereal_string, my $out);
189         # is almost the same but safer than:
190         my $out = $decoder->decode($sereal_string);
191
192       This is an unfortunate side-effect of perls standard copy semantics of
193       assignment. Possibly one day we will have an alternative to this.
194
195   decode_with_header
196       Given a byte string of Sereal data, the "decode_with_header" call
197       deserializes that data structure as "decode" would do, however it also
198       decodes the optional user data structure that can be embedded into a
199       Sereal document, inside the header  (see Sereal::Encoder::encode).
200
201       It accepts an optional second parameter, which is a scalar to write the
202       body to, and an optional third parameter, which is a scalar to write
203       the header to.
204
205       Regardless of the number of parameters received, "decode_with_header"
206       returns an ArrayRef containing the deserialized header, and the
207       deserialized body, in this order.
208
209       See "decode" for the subtle difference between the one, two and three
210       parameters versions.
211
212       If there is no header in a Sereal document, corresponding variable or
213       return value will be set to undef.
214
215   decode_only_header
216       Given a byte string of Sereal data, the "decode_only_header"
217       deserializes only the optional user data structure that can be embedded
218       into a Sereal document, inside the header (see
219       Sereal::Encoder::encode).
220
221       It accepts an optional second parameter, which is a scalar to write the
222       header to.
223
224       Regardless of the number of parameters received, "decode_only_header"
225       returns the resulting data structure.
226
227       See "decode" for the subtle difference between the one and two
228       parameters versions.
229
230       If there is no header in a Sereal document, corresponding variable or
231       return value will be set to undef.
232
233   decode_with_offset
234       Same as the "decode" method, except as second parameter, you must pass
235       an integer offset into the input string, at which the decoding is to
236       start. The optional "pass-in" style scalar (see "decode" above) is
237       relegated to being the third parameter.
238
239   decode_only_header_with_offset
240       Same as the "decode_only_header" method, except as second parameter,
241       you must pass an integer offset into the input string, at which the
242       decoding is to start. The optional "pass-in" style scalar (see
243       "decode_only_header" above) is relegated to being the third parameter.
244
245   decode_with_header_and_offset
246       Same as the "decode_with_header" method, except as second parameter,
247       you must pass an integer offset into the input string, at which the
248       decoding is to start. The optional "pass-in" style scalars (see
249       "decode_with_header" above) are relegated to being the third and fourth
250       parameters.
251
252   bytes_consumed
253       After using the various "decode" methods documented previously,
254       "bytes_consumed" can return the number of bytes from the body of the
255       input string that were actually consumed by the decoder. That is, if
256       you append random garbage to a valid Sereal document, "decode" will
257       happily decode the data and ignore the garbage. If that is an error in
258       your use case, you can use "bytes_consumed" to catch it.
259
260         my $out = $decoder->decode($sereal_string);
261         if (length($sereal_string) != $decoder->bytes_consumed) {
262           die "Not all input data was consumed!";
263         }
264
265       Chances are that if you do this, you're violating UNIX philosophy in
266       "be strict in what you emit but lenient in what you accept".
267
268       You can also use this to deserialize a list of Sereal documents that is
269       concatenated into the same string (code not very robust...):
270
271         my @out;
272         my $pos = 0;
273         eval {
274           while (1) {
275             push @out, $decoder->decode_with_offset($sereal_string, $pos);
276             $pos += $decoder->bytes_consumed;
277             last if $pos >= length($sereal_string)
278                  or not $decoder->bytes_consumed;
279           }
280         };
281
282       As mentioned, only the bytes consumed from the body are considered. So
283       the following example is correct, as only the header is deserialized:
284
285         my $header = $decoder->decode_only_header($sereal_string);
286         my $count = $decoder->bytes_consumed;
287         # $count is 0
288
289   decode_from_file
290           Sereal::Decoder->decode_from_file($file);
291           $decoder->decode_from_file($file);
292
293       Read and decode the file specified. If called in list context and
294       incremental mode is enabled then decodes all packets contained in the
295       file and returns a list, otherwise decodes the first (or only) packet
296       in the file. Accepts an optinal "target" variable as a second argument.
297
298   looks_like_sereal
299       Performs some rudimentary check to determine if the argument appears to
300       be a valid Sereal packet or not. These tests are not comprehensive and
301       a true result does not mean that the document is valid, merely that it
302       appears to be valid. On the other hand a false result is always
303       reliable.
304
305       The return of this method may be treated as a simple boolean but is in
306       fact a more complex return. When the argument does not look anything
307       like a Sereal document then the return is perl's FALSE, which has the
308       property of being string equivalent to "" and numerically equivalent to
309       0. However when the argument appears to be a UTF-8 encoded protocol 3
310       Sereal document (by noticing that the \xF3 in the magic string has been
311       replaced by \xC3\xB3) then it returns 0 (the number, which is string
312       equivalent to "0"), and otherwise returns the protocol version of the
313       document. This means you can write something like this:
314
315           $type= Sereal::Decoder->looks_like_sereal($thing);
316           if ($type eq '') {
317               say "Not a Sereal document";
318           } elsif ($type eq '0') {
319               say "Possibly utf8 encoded Sereal document";
320           } else {
321               say "Sereal document version $type";
322           }
323
324       For reference, Sereal's magic value is a four byte string which is
325       either "=srl" for protocol version 1 and 2 or "=\xF3rl" for protocol
326       version 3 and later. This function checks that the magic string
327       corresponds with the reported version number, as well as other checks,
328       which may be enhanced in the future.
329
330       Note that looks_like_sereal() may be called as a class or object
331       method, and may also be called as a single argument function. See the
332       related scalar_looks_like_sereal() for a version which may ONLY be
333       called as a function, not as a method (and which is typically much
334       faster).
335

EXPORTABLE FUNCTIONS

337   sereal_decode_with_object
338       The functional interface that is equivalent to using "decode". Takes a
339       decoder object reference as first parameter, followed by a byte string
340       to deserialize.  Optionally takes a third parameter, which is the
341       output scalar to write to. See the documentation for "decode" above for
342       details.
343
344       This functional interface is marginally faster than the OO interface
345       since it avoids method resolution overhead and, on sufficiently modern
346       Perl versions, can usually avoid subroutine call overhead. See
347       Sereal::Performance for a discussion on how to tune Sereal for maximum
348       performance if you need to.
349
350   sereal_decode_with_header_with_object
351       The functional interface that is equivalent to using
352       "decode_with_header".  Takes a decoder object reference as first
353       parameter, followed by a byte string to deserialize. Optionally takes
354       third and fourth parameters, which are the output scalars to write to.
355       See the documentation for "decode_with_header" above for details.
356
357       This functional interface is marginally faster than the OO interface
358       since it avoids method resolution overhead and, on sufficiently modern
359       Perl versions, can usually avoid subroutine call overhead. See
360       Sereal::Performance for a discussion on how to tune Sereal for maximum
361       performance if you need to.
362
363   sereal_decode_only_header_with_object
364       The functional interface that is equivalent to using
365       "decode_only_header".  Takes a decoder object reference as first
366       parameter, followed by a byte string to deserialize. Optionally takes a
367       third parameters, which outputs scalars to write to.  See the
368       documentation for "decode_with_header" above for details.
369
370       This functional interface is marginally faster than the OO interface
371       since it avoids method resolution overhead and, on sufficiently modern
372       Perl versions, can usually avoid subroutine call overhead. See
373       Sereal::Performance for a discussion on how to tune Sereal for maximum
374       performance if you need to.
375
376   sereal_decode_only_header_with_offset_with_object
377       The functional interface that is equivalent to using
378       "decode_only_header_with_offset".  Same as the
379       "sereal_decode_only_header_with_object" function, except as the third
380       parameter, you must pass an integer offset into the input string, at
381       which the decoding is to start. The optional "pass-in" style scalar
382       (see "sereal_decode_only_header_with_object" above) is relegated to
383       being the fourth parameter.
384
385       This functional interface is marginally faster than the OO interface
386       since it avoids method resolution overhead and, on sufficiently modern
387       Perl versions, can usually avoid subroutine call overhead. See
388       Sereal::Performance for a discussion on how to tune Sereal for maximum
389       performance if you need to.
390
391   sereal_decode_with_header_and_offset_with_object
392       The functional interface that is equivalent to using
393       "decode_with_header_and_offset".  Same as the
394       "sereal_decode_with_header_with_object" function, except as the third
395       parameter, you must pass an integer offset into the input string, at
396       which the decoding is to start. The optional "pass-in" style scalars
397       (see "sereal_decode_with_header_with_object" above) are relegated to
398       being the fourth and fifth parameters.
399
400       This functional interface is marginally faster than the OO interface
401       since it avoids method resolution overhead and, on sufficiently modern
402       Perl versions, can usually avoid subroutine call overhead. See
403       Sereal::Performance for a discussion on how to tune Sereal for maximum
404       performance if you need to.
405
406   sereal_decode_with_offset_with_object
407       The functional interface that is equivalent to using
408       "decode_with_offset".  Same as the "sereal_decode_with_object"
409       function, except as the third parameter, you must pass an integer
410       offset into the input string, at which the decoding is to start. The
411       optional "pass-in" style scalar (see "sereal_decode_with_object" above)
412       is relegated to being the third parameter.
413
414       This functional interface is marginally faster than the OO interface
415       since it avoids method resolution overhead and, on sufficiently modern
416       Perl versions, can usually avoid subroutine call overhead. See
417       Sereal::Performance for a discussion on how to tune Sereal for maximum
418       performance if you need to.
419
420   decode_sereal
421       The functional interface that is equivalent to using "new" and
422       "decode".  Expects a byte string to deserialize as first argument,
423       optionally followed by a hash reference of options (see documentation
424       for "new()"). Finally, "decode_sereal" supports a third parameter,
425       which is the output scalar to write to. See the documentation for
426       "decode" above for details.
427
428       This functional interface is significantly slower than the OO interface
429       since it cannot reuse the decoder object.
430
431   decode_sereal_with_header_data
432       The functional interface that is equivalent to using "new" and
433       "decode_with_header".  Expects a byte string to deserialize as first
434       argument, optionally followed by a hash reference of options (see
435       documentation for "new()"). Finally, "decode_sereal" supports third and
436       fourth parameters, which are the output scalars to write to. See the
437       documentation for "decode_with_header" above for details.
438
439       This functional interface is significantly slower than the OO interface
440       since it cannot reuse the decoder object.
441
442   scalar_looks_like_sereal
443       The functional interface that is equivalent to using
444       "looks_like_sereal".
445
446       Note that this version cannot be called as a method. It is normally
447       executed as a custom opcode, as such errors about its usage may be
448       caught at compile time, and it should be much faster than
449       looks_like_sereal.
450

ROBUSTNESS

452       This implementation of a Sereal decoder tries to be as robust to
453       invalid input data as reasonably possible. This means that it should
454       never (though read on) segfault. It may, however, cause a large malloc
455       to fail. Generally speaking, invalid data should cause a Perl-trappable
456       exception. The one exception is that for Snappy-compressed Sereal
457       documents, the Snappy library may cause segmentation faults (invalid
458       reads or writes).  This should only be a problem if you do not checksum
459       your data (internal checksum support is a To-Do) or if you accept data
460       from potentially malicious sources.
461
462       It requires a lot of run-time boundary checks to prevent decoder
463       segmentation faults on invalid data. We implemented them in the
464       lightest way possible. Adding robustness against running out of memory
465       would cause an very significant run-time overhead. In most cases of
466       random garbage (with valid header no less) when a malloc() fails due to
467       invalid data, the problem was caused by a very large array or string
468       length. This kind of very large malloc can then fail, being trappable
469       from Perl. Only when packet causes many repeated allocations do you
470       risk causing a hard OOM error from the kernel that cannot be trapped
471       because Perl may require some small allocations to succeed before the
472       now-invalid memory is released. It is at least not entirely trivial to
473       craft a Sereal document that causes this behaviour.
474
475       Finally, deserializing proper objects is potentially a problem because
476       classes can define a destructor. Thus, the data fed to the decoder can
477       cause the (deferred) execution of any destructor in your application.
478       That's why the "refuse_objects" option exists and what the
479       "no_bless_objects" can be used for as well. Later on, we may or may not
480       provide a facility to whitelist classes. Furthermore, if the encoder
481       emitted any objects using "FREEZE" callbacks, the "THAW" class method
482       may be invoked on the respective classes. If you can't trust the source
483       of your Sereal documents, you may want to use the "refuse_objects"
484       option. For more details on the "FREEZE/THAW" mechanism, please refer
485       to Sereal::Encoder.
486

PERFORMANCE

488       Please refer to the Sereal::Performance document that has more detailed
489       information about Sereal performance and tuning thereof.
490

THREAD-SAFETY

492       "Sereal::Decoder" is thread-safe on Perl's 5.8.7 and higher. This means
493       "thread-safe" in the sense that if you create a new thread, all
494       "Sereal::Decoder" objects will become a reference to undef in the new
495       thread. This might change in a future release to become a full clone of
496       the decoder object.
497

BUGS, CONTACT AND SUPPORT

499       For reporting bugs, please use the github bug tracker at
500       <http://github.com/Sereal/Sereal/issues>.
501
502       For support and discussion of Sereal, there are two Google Groups:
503
504       Announcements around Sereal (extremely low volume):
505       <https://groups.google.com/forum/?fromgroups#!forum/sereal-announce>
506
507       Sereal development list:
508       <https://groups.google.com/forum/?fromgroups#!forum/sereal-dev>
509

AUTHORS AND CONTRIBUTORS

511       Yves Orton <demerphq@gmail.com>
512
513       Damian Gryski
514
515       Steffen Mueller <smueller@cpan.org>
516
517       Rafaël Garcia-Suarez
518
519       Ævar Arnfjörð Bjarmason <avar@cpan.org>
520
521       Tim Bunce
522
523       Daniel Dragan <bulkdd@cpan.org> (Windows support and bugfixes)
524
525       Zefram
526
527       Borislav Nikolov
528
529       Ivan Kruglov <ivan.kruglov@yahoo.com>
530
531       Eric Herman <eric@freesa.org>
532
533       Some inspiration and code was taken from Marc Lehmann's excellent
534       JSON::XS module due to obvious overlap in problem domain.
535

ACKNOWLEDGMENT

537       This module was originally developed for Booking.com.  With approval
538       from Booking.com, this module was generalized and published on CPAN,
539       for which the authors would like to express their gratitude.
540
542       Copyright (C) 2012, 2013, 2014 by Steffen Mueller Copyright (C) 2012,
543       2013, 2014 by Yves Orton
544
545       The license for the code in this distribution is the following, with
546       the exceptions listed below:
547
548       This library is free software; you can redistribute it and/or modify it
549       under the same terms as Perl itself.
550
551       Except portions taken from Marc Lehmann's code for the JSON::XS module,
552       which is licensed under the same terms as this module.  (Many thanks to
553       Marc for inspiration, and code.)
554
555       Also except the code for Snappy compression library, whose license is
556       reproduced below and which, to the best of our knowledge, is compatible
557       with this module's license. The license for the enclosed Snappy code
558       is:
559
560         Copyright 2011, Google Inc.
561         All rights reserved.
562
563         Redistribution and use in source and binary forms, with or without
564         modification, are permitted provided that the following conditions are
565         met:
566
567           * Redistributions of source code must retain the above copyright
568         notice, this list of conditions and the following disclaimer.
569           * Redistributions in binary form must reproduce the above
570         copyright notice, this list of conditions and the following disclaimer
571         in the documentation and/or other materials provided with the
572         distribution.
573           * Neither the name of Google Inc. nor the names of its
574         contributors may be used to endorse or promote products derived from
575         this software without specific prior written permission.
576
577         THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
578         "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
579         LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
580         A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
581         OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
582         SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
583         LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
584         DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
585         THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
586         (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
587         OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
588
589
590
591perl v5.30.0                      2019-07-26                Sereal::Decoder(3)
Impressum