1Sereal::Encoder(3) User Contributed Perl Documentation Sereal::Encoder(3)
2
3
4
6 Sereal::Encoder - Fast, compact, powerful binary serialization
7
9 use Sereal::Encoder qw(encode_sereal sereal_encode_with_object);
10
11 my $encoder = Sereal::Encoder->new({...options...});
12 my $out = $encoder->encode($structure);
13
14 # alternatively the functional interface:
15 $out = sereal_encode_with_object($encoder, $structure);
16
17 # much slower functional interface with no persistent objects:
18 $out = encode_sereal($structure, {... options ...});
19
21 This library implements an efficient, compact-output, and feature-rich
22 serializer using a binary protocol called Sereal. Its sister module
23 Sereal::Decoder implements a decoder for this format. The two are
24 released separately to allow for independent and safer upgrading. If
25 you care greatly about performance, consider reading the
26 Sereal::Performance documentation after finishing this document.
27
28 The Sereal protocol version emitted by this encoder implementation is
29 currently protocol version 4 by default.
30
31 The protocol specification and many other bits of documentation can be
32 found in the github repository. Right now, the specification is at
33 <https://github.com/Sereal/Sereal/blob/master/sereal_spec.pod>, there
34 is a discussion of the design objectives in
35 <https://github.com/Sereal/Sereal/blob/master/README.pod>, and the
36 output of our benchmarks can be seen at
37 <https://github.com/Sereal/Sereal/wiki/Sereal-Comparison-Graphs>. For
38 more information on getting the best performance out of Sereal, have a
39 look at the "PERFORMANCE" section below.
40
42 new
43 Constructor. Optionally takes a hash reference as first parameter. This
44 hash reference may contain any number of options that influence the
45 behaviour of the encoder.
46
47 Currently, the following options are recognized, none of them are on by
48 default.
49
50 compress
51
52 If this option provided and true, compression of the document body is
53 enabled. As of Sereal version 4, three different compression
54 techniques are supported and can be enabled by setting "compress" to
55 the respective named constants (exportable from the "Sereal::Encoder"
56 module): Snappy (named constant: "SRL_SNAPPY"), Zlib ("SRL_ZLIB") and
57 Zstd ("SRL_ZSTD"). For your convenience, there is also a
58 "SRL_UNCOMPRESSED" constant.
59
60 If this option is set, then the Snappy-related options below are
61 ignored. They are otherwise recognized for compatibility only.
62
63 compress_threshold
64
65 The size threshold (in bytes) of the uncompressed output below which
66 compression is not even attempted even if enabled. Defaults to one
67 kilobyte (1024 bytes). Set this to 0 and "compress" to a
68 non-"SRL_UNCOMPRESSED" value to always attempt to compress. Note that
69 the document will not be compressed if the resulting size will be
70 bigger than the original size (even if "compress_threshold" is 0).
71
72 compress_level
73
74 If Zlib or Zstd compressions are used, then this option will set a
75 compression level: Zlib uses range from 1 (fastest) to 9 (best).
76 Defaults to 6. Zstd uses range from 1 (fastest) to 22 (best). Default
77 is 3.
78
79 snappy
80
81 See also the "compress" option. This option is provided only for
82 compatibility with Sereal V1.
83
84 If set, the main payload of the Sereal document will be compressed
85 using Google's Snappy algorithm. This can yield anywhere from no effect
86 to significant savings on output size at rather low run time cost. If
87 in doubt, test with your data whether this helps or not.
88
89 The decoder (version 0.04 and up) will know how to handle Snappy-
90 compressed Sereal documents transparently.
91
92 Note: The "snappy_incr" and "snappy" options are identical in Sereal
93 protocol v2 and up (so by default). If using an older protocol version
94 (see "protocol_version" and "use_protocol_v1" options below) to emit
95 Sereal V1 documents, this emits non-incrementally decodable documents.
96 See "snappy_incr" in those cases.
97
98 snappy_incr
99
100 See also the "compress" option. This option is provided only for
101 compatibility with Sereal V1.
102
103 Same as the "snappy" option for default operation (that is in Sereal v2
104 or up).
105
106 In Sereal V1, enables a version of the Snappy protocol which is
107 suitable for incremental parsing of packets. See also the "snappy"
108 option above for more details.
109
110 snappy_threshold
111
112 See also the "compress" option. This option is provided only for
113 compatibility with Sereal V1.
114
115 This option is a synonym for the "compress_threshold" option, but only
116 if Snappy compression is enabled.
117
118 croak_on_bless
119
120 If this option is set, then the encoder will refuse to serialize
121 blessed references and throw an exception instead.
122
123 This can be important because blessed references can mean executing a
124 destructor on a remote system or generally executing code based on
125 data.
126
127 See also "no_bless_objects" to skip the blessing of objects. When both
128 flags are set, "croak_on_bless" has a higher precedence then
129 "no_bless_objects".
130
131 freeze_callbacks
132
133 This option was introduced in Sereal v2 and needs a Sereal v2 decoder.
134
135 If this option is set, the encoder will check for and possibly invoke
136 the "FREEZE" method on any object in the input data. An object that was
137 serialized using its "FREEZE" method will have its corresponding "THAW"
138 class method called during deserialization. The exact semantics are
139 documented below under "FREEZE/THAW CALLBACK MECHANISM".
140
141 Beware that using this functionality means a significant slowdown for
142 object serialization. Even when serializing objects without a "FREEZE"
143 method, the additional method look up will cost a small amount of
144 runtime. Yes, "Sereal::Encoder" is so fast that this may make a
145 difference.
146
147 no_bless_objects
148
149 If this option is set, then the encoder will serialize blessed
150 references without the bless information and provide plain data
151 structures instead.
152
153 See also the "croak_on_bless" option above for more details.
154
155 undef_unknown
156
157 If set, unknown/unsupported data structures will be encoded as "undef"
158 instead of throwing an exception.
159
160 Mutually exclusive with "stringify_unknown". See also "warn_unknown"
161 below.
162
163 stringify_unknown
164
165 If set, unknown/unsupported data structures will be stringified and
166 encoded as that string instead of throwing an exception. The
167 stringification may cause a warning to be emitted by perl.
168
169 Mutually exclusive with "undef_unknown". See also "warn_unknown"
170 below.
171
172 warn_unknown
173
174 Only has an effect if "undef_unknown" or "stringify_unknown" are
175 enabled.
176
177 If set to a positive integer, any unknown/unsupported data structure
178 encountered will emit a warning. If set to a negative integer, it will
179 warn for unsupported data structures just the same as for a positive
180 value with one exception: For blessed, unsupported items that have
181 string overloading, we silently stringify without warning.
182
183 max_recursion_depth
184
185 "Sereal::Encoder" is recursive. If you pass it a Perl data structure
186 that is deeply nested, it will eventually exhaust the C stack.
187 Therefore, there is a limit on the depth of recursion that is accepted.
188 It defaults to 10000 nested calls. You may choose to override this
189 value with the "max_recursion_depth" option. Beware that setting it too
190 high can cause hard crashes, so only do that if you KNOW that it is
191 safe to do so.
192
193 Do note that the setting is somewhat approximate. Setting it to 10000
194 may break at somewhere between 9997 and 10003 nested structures
195 depending on their types.
196
197 canonical
198
199 Enable all options which are related to producing canonical output, so
200 that two strucutures with similar contents produce the same serialized
201 form.
202
203 See the caveats elsewhere in this document about producing canonical
204 output.
205
206 Currently sets the default for the following parameters:
207 "canonical_refs" and "sort_keys". If the option is explicitly set then
208 this setting is ignored. More options may be added in the future.
209
210 You are warned that use of this option may incur additional performance
211 penalties in a future release by enabling other options than those
212 listed here.
213
214 canonical_refs
215
216 Normally "Sereal::Encoder" will ARRAYREF and HASHREF tags when the item
217 contains less than 16 items, and and is not referenced more than once.
218 This flag will override this optimization and use a standard REFN ARRAY
219 style tag output. This is primarily useful for producing canonical
220 output and for testing Sereal itself.
221
222 See "CANONICAL REPRESENTATION" for why you might want to use this, and
223 for the various caveats involved.
224
225 sort_keys
226
227 Normally "Sereal::Encoder" will output hashes in whatever order is
228 convenient, generally that used by perl to actually store the hash, or
229 whatever order was returned by a tied hash.
230
231 If this option is enabled then the Encoder will sort the keys before
232 outputting them. It uses more memory, and is quite a bit slower than
233 the default.
234
235 Generally speaking this should mean that a hash and a copy should
236 produce the same output. Nevertheless the user is warned that Perl has
237 a way of "morphing" variables on use, and some of its rules are a
238 little arcane (for instance utf8 keys), and so two hashes that might
239 appear to be the same might still produce different output as far as
240 Sereal is concerned.
241
242 As of 3.006_007 (prerelease candidate for 3.007) the sort order has
243 been changed to the following: order by length of keys (in bytes)
244 ascending, then by byte order of the raw underlying string, then by
245 utf8ness, with non-utf8 first. This order was chosen because it is the
246 most efficient to implement, both in terms of memory and time. This
247 sort order is enabled when sort_keys is set to 1.
248
249 You may also produce output in Perl "cmp" order, by setting sort_keys
250 to 2. And for backwards compatibility you may also produce output in
251 reverse Perl "cmp" order by setting sort_keys to 3. Prior to 3.006_007
252 this was the only sort order possible, although it was not explicitly
253 defined what it was.
254
255 Note that comparatively speaking both of the "cmp" sort orders are slow
256 and memory inefficient. Unless you have a really good reason stick to
257 the default which is fast and as lean as possible.
258
259 Unless you are concerned with "cross process canonical representation"
260 then it doesn't matter what option you choose.
261
262 See "CANONICAL REPRESENTATION" for why you might want to use this, and
263 for the various caveats involved.
264
265 no_shared_hashkeys
266
267 When the "no_shared_hashkeys" option is set to a true value, then the
268 encoder will disable the detection and elimination of repeated hash
269 keys. This only has an effect for serializing structures containing
270 hashes. By skipping the detection of repeated hash keys, performance
271 goes up a bit, but the size of the output can potentially be much
272 larger.
273
274 Do not disable this unless you have a reason to.
275
276 dedupe_strings
277
278 If this is option is enabled/true then Sereal will use a hash to encode
279 duplicates of strings during serialization efficiently using (internal)
280 backreferences. This has a performance and memory penalty during
281 encoding so it defaults to off. On the other hand, data structures
282 with many duplicated strings will see a significant reduction in the
283 size of the encoded form. Currently only strings longer than 3
284 characters will be deduped, however this may change in the future.
285
286 Note that Sereal will perform certain types of deduping automatically
287 even without this option. In particular class names and hash keys (see
288 also the "no_shared_hashkeys" setting) are deduped regardless of this
289 option. Only enable this if you have good reason to believe that there
290 are many duplicated strings as values in your data structure.
291
292 Use of this option does not require an upgraded decoder (this option
293 was added in Sereal::Encoder 0.32). The deduping is performed in such a
294 way that older decoders should handle it just fine. In other words,
295 the output of a Sereal decoder should not depend on whether this option
296 was used during encoding. See also below: aliased_dedupe_strings.
297
298 aliased_dedupe_strings
299
300 This is an advanced option that should be used only after fully
301 understanding its ramifications.
302
303 This option enables a mode of operation that is similar to
304 dedupe_strings and if both options are set, aliased_dedupe_strings
305 takes precedence.
306
307 The behaviour of aliased_dedupe_strings differs from dedupe_strings in
308 that the duplicate occurrences of strings are emitted as Perl language
309 level aliases instead of as Sereal-internal backreferences. This means
310 that using this option actually produces a different output data
311 structure when decoding. The upshot is that with this option, the
312 application using (decoding) the data may save a lot of memory in some
313 situations but at the cost of potential action at a distance due to the
314 aliasing.
315
316 Beware: The test suite currently does not cover this option as well as
317 it probably should. Patches welcome.
318
319 use_standard_double
320
321 This option can be used to force Perls built with uselongdouble or
322 quadmath to use DOUBLE instead of the native floating point. This can
323 be helpful interoperating with Perls which do not support larger sized
324 floats. Note that "uselongdouble" means different things in different
325 places, so this option may be helpful for such builds. We do not enable
326 this option by default for backwards compatibility reasons, and because
327 doing so would lose precision.
328
329 protocol_version
330
331 Specifies the version of the Sereal protocol to emit. Valid are
332 integers between 1 and the current version. If not specified, the most
333 recent protocol version will be used. See also "use_protocol_v1":
334
335 It is strongly advised to use the latest protocol version outside of
336 migration periods.
337
338 use_protocol_v1
339
340 This option is deprecated in favour of the "protocol_version" option
341 (see above).
342
343 If set, the encoder will emit Sereal documents following protocol
344 version 1. This is strongly discouraged except for temporary
345 compatibility/migration purposes.
346
348 encode
349 Given a Perl data structure, serializes that data structure and returns
350 a binary string that can be turned back into the original data
351 structure by Sereal::Decoder. The method expects a data structure to
352 serialize as first argument, optionally followed by a header data
353 structure.
354
355 A header is intended for embedding small amounts of meta data, such as
356 routing information, in a document that allows users to avoid
357 deserializing main body needlessly.
358
359 encode_to_file
360 Sereal::Encoder->encode_to_file($file,$data,$append);
361 $encoder->encode_to_file($file,$data,$append);
362
363 Encode the data specified and write it the named file. If $append is
364 true then the written data is appended to any existing data, otherwise
365 any existing data will be overwritten. Dies if any errors occur during
366 writing the encoded data.
367
369 sereal_encode_with_object
370 The functional interface that is equivalent to using "encode". Takes an
371 encoder object reference as first argument, followed by a data
372 structure and optional header to serialize.
373
374 This functional interface is marginally faster than the OO interface
375 since it avoids method resolution overhead and, on sufficiently modern
376 Perl versions, can usually avoid subroutine call overhead.
377
378 encode_sereal
379 The functional interface that is equivalent to using "new" and
380 "encode". Expects a data structure to serialize as first argument,
381 optionally followed by a hash reference of options (see documentation
382 for new()).
383
384 This function cannot be used for encoding a data structure with a
385 header. See "encode_sereal_with_header_data".
386
387 This functional interface is significantly slower than the OO interface
388 since it cannot reuse the encoder object.
389
390 encode_sereal_with_header_data
391 The functional interface that is equivalent to using "new" and
392 "encode". Expects a data structure and a header to serialize as first
393 and second arguments, optionally followed by a hash reference of
394 options (see documentation for new()).
395
396 This functional interface is significantly slower than the OO interface
397 since it cannot reuse the encoder object.
398
400 See Sereal::Performance for detailed considerations on performance
401 tuning. Let it just be said that:
402
403 If you care about performance at all, then use
404 "sereal_encode_with_object" or the OO interface instead of
405 "encode_sereal". It's a significant difference in performance if you
406 are serializing small data structures.
407
408 The exact performance in time and space depends heavily on the data
409 structure to be serialized. Often there is a trade-off between space
410 and time. If in doubt, do your own testing and most importantly ALWAYS
411 TEST WITH REAL DATA. If you care purely about speed at the expense of
412 output size, you can use the "no_shared_hashkeys" option for a small
413 speed-up. If you need smaller output at the cost of higher CPU load and
414 more memory used during encoding/decoding, try the "dedupe_strings"
415 option and enable Snappy compression.
416
417 For ready-made comparison scripts, see the author_tools/bench.pl and
418 author_tools/dbench.pl programs that are part of this distribution.
419 Suffice to say that this library is easily competitive in both time and
420 space efficiency with the best alternatives.
421
423 Some objects do not lend themselves naturally to naive perl
424 datastructure level serialization. For instance XS code might use a
425 hidden structure that would not get serialized, or an object may
426 contain volatile data like a filehandle that would not be reconstituted
427 properly. To support cases like this "Sereal" supports a FREEZE and
428 THAW api. When objects are serialized their FREEZE method is asked for
429 a replacement representation, and when objects are deserialized their
430 THAW method is asked to convert that replacement back to something
431 useful.
432
433 This mechanism is enabled using the "freeze_callbacks" option of the
434 encoder. It is inspired by the equivalent mechanism in CBOR::XS. The
435 general mechanism is documented in the A GENERIC OBJECT SERIALIATION
436 PROTOCOL section of Types::Serialiser. Similar to CBOR using "CBOR",
437 Sereal uses the string "Sereal" as a serializer identifier for the
438 callbacks.
439
440 Here is a contrived example of a class implementing the "FREEZE" /
441 "THAW" mechanism.
442
443 package
444 File;
445
446 use Moo;
447
448 has 'path' => (is => 'ro');
449 has 'fh' => (is => 'rw');
450
451 # open file handle if necessary and return it
452 sub get_fh {
453 my $self = shift;
454 # This could also be done with fancier Moo(se) syntax
455 my $fh = $self->fh;
456 if (not $fh) {
457 open $fh, "<", $self->path or die $!;
458 $self->fh($fh);
459 }
460 return $fh;
461 }
462
463 sub FREEZE {
464 my ($self, $serializer) = @_;
465 # Could switch on $serializer here: JSON, CBOR, Sereal, ...
466 # But this case is so simple that it will work with ALL of them.
467 # Do not try to serialize our file handle! Path will be enough
468 # to recreate.
469 return $self->path;
470 }
471
472 sub THAW {
473 my ($class, $serializer, $data) = @_;
474 # Turn back into object.
475 return $class->new(path => $data);
476 }
477
478 Why is the "FREEZE"/"THAW" mechanism important here? Our contrived
479 "File" class may contain a file handle which can't be serialized. So
480 "FREEZE" not only returns just the path (which is more compact than
481 encoding the actual object contents), but it strips the file handle
482 which can be lazily reopened on the other side of the
483 serialization/deserialization pipe. But this example also shows that a
484 naive implementation can easily end up with subtle bugs. A file handle
485 itself has state (position in file, etc). Thus the deserialization in
486 the above example won't accurately reproduce the original state. It
487 can't, of course, if it's deserialized in a different environment
488 anyway.
489
491 "Sereal::Encoder" is thread-safe on Perl's 5.8.7 and higher. This means
492 "thread-safe" in the sense that if you create a new thread, all
493 "Sereal::Encoder" objects will become a reference to undef in the new
494 thread. This might change in a future release to become a full clone of
495 the encoder object.
496
498 You might want to compare two data structures by comparing their
499 serialized byte strings. For that to work reliably the serialization
500 must take extra steps to ensure that identical data structures are
501 encoded into identical serialized byte strings (a so-called "canonical
502 representation").
503
504 Unfortunately in Perl there is no such thing as a "canonical
505 representation". Most people are interested in "structural
506 equivalence" but even that is less well defined than most people think.
507 For instance in the following example:
508
509 my $array1= [ 0, 0 ];
510 my $array2= do {
511 my $zero= 0;
512 sub{ \@_ }->($zero,$zero);
513 };
514
515 the question of whether $array1 is structurally equivalent to $array2
516 is a subjective one. Sereal for instance would NOT consider them
517 equivalent but "Test::Deep" would. There are many examples of this in
518 Perl. Simply stringifying a number technically changes the scalar.
519 Storable would notice this, but Sereal generally would not.
520
521 Despite this as of 3.002 the Sereal encoder supports a "canonical"
522 option which will make a "best effort" attempt at producing a canonical
523 representation of a data structure. This mode is actually a
524 combination of several other modes which may also be enabled
525 independently, and as and when we add new options to the encoder that
526 would assist in this regard then the "canonical" will also enable them.
527 These options may come with a performance penalty so care should be
528 taken to read the Changes file and test the performance implications
529 when upgrading a system that uses this option.
530
531 It is important to note that using canonical representation to
532 determine if two data structures are different is subject to false-
533 positives. If two Sereal encodings are identical you can generally
534 assume that the two data structures are functionally equivalent from
535 the point of view of normal Perl code (XS code might disagree). However
536 if two Sereal encodings differ the data structures may actually be
537 functionally equivalent. In practice it seems the the false-positive
538 rate is low, but your milage may vary.
539
540 Some of the issues with producing a true canonical representation are
541 outlined below:
542
543 Sereal doesn't order the hash keys by default.
544 This can be enabled via the "sort_keys", which is itself enabled by
545 "canonical" option.
546
547 Sereal output is sensitive to refcounts
548 This can be somewhat mitigated by the use of "canonical_refs", see
549 above.
550
551 There are multiple valid Sereal documents that you can produce for the
552 same Perl data structure.
553 Just sorting hash keys is not enough. Some of the reasons are
554 outlined below. These issues are especially relevant when
555 considering language interoperability.
556
557 PAD bytes
558 A trivial example is PAD bytes which mean nothing and are
559 skipped. They mostly exist for encoder optimizations to prevent
560 certain nasty backtracking situations from becoming O(n) at the
561 cost of one byte of output. An explicit canonical mode would
562 have to outlaw them (or add more of them) and thus require a
563 much more complicated implementation of refcount/weakref
564 handing in the encoder while at the same time causing some
565 operations to go from O(1) to a full memcpy of everything after
566 the point of where we backtracked to. Nasty.
567
568 COPY tag
569 Another example is COPY. The COPY tag indicates that the next
570 element is an identical copy of a previous element (which is
571 itself forbidden from including COPY's other than for class
572 names). COPY is purely internal. The Perl/XS implementation
573 uses it to share hash keys and class names. One could use it
574 for other strings (theoretically), but doesn't for time-
575 efficiency reasons. We'd have to outlaw the use of this
576 (significant) optimization of canonicalization.
577
578 REF representation
579 Sereal represents a reference to an array as a sequence of tags
580 which, in its simplest form, reads REF, ARRAY $array_length
581 TAG1 TAG2 .... The separation of "REF" and "ARRAY" is
582 necessary to properly implement all of Perl's referencing and
583 aliasing semantics correctly. Quite frequently, however, your
584 array is only referenced once and plainly so. If it's also at
585 most 15 elements long, Sereal optimizes all of the "REF" and
586 "ARRAY" tags, as well as the length into a special one byte
587 ARRAYREF tag. This is a very significant optimization for
588 common cases. This, however, does mean that most arrays up to
589 15 elements could be represented in two different, yet
590 perfectly valid forms. ARRAYREF would have to be outlawed for a
591 properly canonical form. The exact same logic applies to HASH
592 vs. HASHREF. This behavior can be overridden by the
593 "canonical_refs" option, which disables use of HASHREF and
594 ARRAYREF.
595
596 Numeric representation
597 Similar to how Sereal can represent arrays and hashes in a full
598 and a compact form. For small integers (between -16 and +15
599 inclusive), Sereal emits only one byte including the encoding
600 of the type of data. For larger integers, it can use either
601 variants (positive only) or zigzag encoding, which can also
602 represent negative numbers. For a canonical mode, the space
603 optimizations would have to be turned off and it would have to
604 be explicitly specified whether variant or zigzag encoding is
605 to be used for encoding positive integers.
606
607 Perl may choose to retain multiple representations of a scalar.
608 Specifically, it can convert integers, floating point numbers,
609 and strings on the fly and will aggressively cache the results.
610 Normally, it remembers which of the representations can be
611 considered canonical, that means, which can be used to recreate
612 the others reliably. For example, 0 and "0" can both be
613 considered canonical since they naturally transform into each
614 other. Beyond intrinsic ambiguity, there are ways to trick Perl
615 into allowing a single scalar to have distinct string, integer,
616 and floating point representations that are all flagged as
617 canonical, but can't be transformed into each other. These are
618 the so-called dualvars. Sereal cannot represent dualvars (and
619 that's a good thing).
620
621 Floating point values can appear to be the same but serialize
622 to different byte strings due to insignificant 'noise' in the
623 floating point representation. Sereal supports different
624 floating point precisions and will generally choose the most
625 compact that can represent your floating point number
626 correctly.
627
628 There's also a few cases where Sereal will produce different
629 documents for values that you might think are the same thing,
630 because if you e.g. compared them with "eq" or "==" in perl itself
631 would think they were equivalent. However for the purposes of
632 serialization they're not the same value.
633
634 A good example of these cases is where Test::Deep and Sereal's
635 canonical mode differ. We have tests for some of these cases in
636 t/030_canonical_vs_test_deep.t. Here's the issues we've noticed so
637 far:
638
639 Sereal considers ASCII strings with the UTF-8 flag to be different
640 from the same string without the UTF-8 flag
641 Consider:
642
643 my $language_code = "en";
644
645 v.s.:
646
647 my $language_code = "en";
648 utf8::upgrade($en);
649
650 Sereal's canonical mode will encode these strings differently,
651 as it should, since the UTF-8 flag will be passed along on
652 interpolation.
653
654 But this can be confusing if you're just getting some user-
655 supplied ASCII strings that you may inadvertently toggle the
656 UTF-8 flag on, e.g. because you're comparing an ASCII value in
657 a database to a value submitted in a UTF-8 web form.
658
659 Sereal will encode strings that look like numbers as strings,
660 unless they've been used in numeric context
661 I.e. these values will be encoded differently, respectively:
662
663 my $IV_x = "12345";
664 my $IV_y = "12345" + 0;
665 my $NV_x = "12.345";
666 my $NV_y = "12.345" + 0;
667
668 But as noted above something like Test::Deep will consider
669 these to be the same thing.
670
671 We might produce certain aggressive flags to the canonical mode in
672 the future to deal with this. For the cases noted above some
673 combination of turning the UTF-8 flag on on all strings, or
674 stripping it from strings that have it but are ASCII-only would
675 "work", similarly we could scan strings to see if they match
676 looks_like_number() and if so numify them.
677
678 This would produce output that either would be a lot bigger (having
679 to encode all numbers as strings), or would be more expensive to
680 generate (having to scan strings for numeric or non-ASCII context),
681 and for some cases like the UTF-8 flag munging wouldn't be suitable
682 for general use outside of canonicialization.
683
684 Often, people don't actually care about "canonical" in the strict sense
685 required for real identity checking. They just require a best-effort
686 sort of thing for caching. But it's a slippery slope!
687
688 In a nutshell, the "canonical" option may be sufficient for an
689 application which is simply serializing a cache key, and thus there's
690 little harm in an occasional false-negative, but think carefully before
691 applying Sereal in other use-cases.
692
694 Strings Or Numbers
695 Perl does not make a strong distinction between strings and
696 numbers, and from an internal point of view it can be difficult to
697 tell what the "right" representation is for a given variable.
698
699 Sereal tries to not be lossy. So if it detects that the string
700 value of a var, and the numeric value are different it will
701 generally round trip the *string* value. This means that "special"
702 strings often used in Perl function returns, like "0 but true", and
703 "0e0", will round trip in a way that their normal Perl semantics
704 are preserved. However this also means that "non canonical" values,
705 like " 100 ", which will numify as 100 without warnings, will round
706 trip as their string values.
707
708 Perl also has some operators, the binary operators, ^, | and &,
709 which do different things depending on whether their arguments had
710 been used in numeric context as the following examples show:
711
712 perl -le'my $x="1"; $i=int($x); print unpack "H*", $x ^ "1"'
713 30
714
715 perl -le'my $x="1"; print unpack "H*", $x ^ "1"'
716 00
717
718 perl -le'my $x=" 1 "; $i=int($x); print unpack "H*", $x ^ "1"'
719 30
720
721 perl -le'my $x=" 1 "; print unpack "H*", $x ^ "1"'
722 113120
723
724 Sereal currently cannot round trip this property properly.
725
726 An extreme case of this problem is that of "dualvars", which can be
727 created using the Scalar::Util::dualvar() function. This function
728 allows one to create variables which have string and integer values
729 which are completely unrelated to each other. Sereal currently
730 will choose the *string* value when it detects these items.
731
732 It is possible that a future release of the protocol will fix these
733 issues.
734
735 Booleans
736 As of Perl 5.36 and protocol version 5 Sereal now supports
737 booleans. The new tags SRL_HDR_YES and SRL_HDR_NO now represent
738 perl bools, the old special variables that SRL_HDR_TRUE and
739 SRL_HDR_FALSE may still be generated, but beyond being readonly
740 these are equivalent to SRL_HDR_YES and SRL_HDR_NO.
741
743 For reporting bugs, please use the github bug tracker at
744 <http://github.com/Sereal/Sereal/issues>.
745
746 For support and discussion of Sereal, there are two Google Groups:
747
748 Announcements around Sereal (extremely low volume):
749 <https://groups.google.com/forum/?fromgroups#!forum/sereal-announce>
750
751 Sereal development list:
752 <https://groups.google.com/forum/?fromgroups#!forum/sereal-dev>
753
755 Yves Orton <demerphq@gmail.com>
756
757 Damian Gryski
758
759 Steffen Mueller <smueller@cpan.org>
760
761 Rafaël Garcia-Suarez
762
763 Ævar Arnfjörð Bjarmason <avar@cpan.org>
764
765 Tim Bunce
766
767 Daniel Dragan <bulkdd@cpan.org> (Windows support and bugfixes)
768
769 Zefram
770
771 Borislav Nikolov
772
773 Ivan Kruglov <ivan.kruglov@yahoo.com>
774
775 Some inspiration and code was taken from Marc Lehmann's excellent
776 JSON::XS module due to obvious overlap in problem domain. Thank you!
777
779 This module was originally developed for Booking.com. With approval
780 from Booking.com, this module was generalized and published on CPAN,
781 for which the authors would like to express their gratitude.
782
784 Copyright (C) 2012, 2013, 2014 by Steffen Mueller Copyright (C) 2012,
785 2013, 2014 by Yves Orton
786
787 The license for the code in this distribution is the following, with
788 the exceptions listed below:
789
790 This library is free software; you can redistribute it and/or modify it
791 under the same terms as Perl itself.
792
793 Except portions taken from Marc Lehmann's code for the JSON::XS module,
794 which is licensed under the same terms as this module.
795
796 Also except the code for Snappy compression library, whose license is
797 reproduced below and which, to the best of our knowledge, is compatible
798 with this module's license. The license for the enclosed Snappy code
799 is:
800
801 Copyright 2011, Google Inc.
802 All rights reserved.
803
804 Redistribution and use in source and binary forms, with or without
805 modification, are permitted provided that the following conditions are
806 met:
807
808 * Redistributions of source code must retain the above copyright
809 notice, this list of conditions and the following disclaimer.
810 * Redistributions in binary form must reproduce the above
811 copyright notice, this list of conditions and the following disclaimer
812 in the documentation and/or other materials provided with the
813 distribution.
814 * Neither the name of Google Inc. nor the names of its
815 contributors may be used to endorse or promote products derived from
816 this software without specific prior written permission.
817
818 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
819 "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
820 LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
821 A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
822 OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
823 SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
824 LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
825 DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
826 THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
827 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
828 OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
829
830
831
832perl v5.36.0 2023-02-08 Sereal::Encoder(3)