1Sereal::Encoder(3) User Contributed Perl Documentation Sereal::Encoder(3)
2
3
4
6 Sereal::Encoder - Fast, compact, powerful binary serialization
7
9 use Sereal::Encoder qw(encode_sereal sereal_encode_with_object);
10
11 my $encoder = Sereal::Encoder->new({...options...});
12 my $out = $encoder->encode($structure);
13
14 # alternatively the functional interface:
15 $out = sereal_encode_with_object($encoder, $structure);
16
17 # much slower functional interface with no persistent objects:
18 $out = encode_sereal($structure, {... options ...});
19
21 This library implements an efficient, compact-output, and feature-rich
22 serializer using a binary protocol called Sereal. Its sister module
23 Sereal::Decoder implements a decoder for this format. The two are
24 released separately to allow for independent and safer upgrading. If
25 you care greatly about performance, consider reading the
26 Sereal::Performance documentation after finishing this document.
27
28 The Sereal protocol version emitted by this encoder implementation is
29 currently protocol version 4 by default.
30
31 The protocol specification and many other bits of documentation can be
32 found in the github repository. Right now, the specification is at
33 <https://github.com/Sereal/Sereal/blob/master/sereal_spec.pod>, there
34 is a discussion of the design objectives in
35 <https://github.com/Sereal/Sereal/blob/master/README.pod>, and the
36 output of our benchmarks can be seen at
37 <https://github.com/Sereal/Sereal/wiki/Sereal-Comparison-Graphs>. For
38 more information on getting the best performance out of Sereal, have a
39 look at the "PERFORMANCE" section below.
40
42 new
43 Constructor. Optionally takes a hash reference as first parameter. This
44 hash reference may contain any number of options that influence the
45 behaviour of the encoder.
46
47 Currently, the following options are recognized, none of them are on by
48 default.
49
50 compress
51
52 If this option provided and true, compression of the document body is
53 enabled. As of Sereal version 4, three different compression
54 techniques are supported and can be enabled by setting "compress" to
55 the respective named constants (exportable from the "Sereal::Encoder"
56 module): Snappy (named constant: "SRL_SNAPPY"), Zlib ("SRL_ZLIB") and
57 Zstd ("SRL_ZSTD"). For your convenience, there is also a
58 "SRL_UNCOMPRESSED" constant.
59
60 If this option is set, then the Snappy-related options below are
61 ignored. They are otherwise recognized for compatibility only.
62
63 compress_threshold
64
65 The size threshold (in bytes) of the uncompressed output below which
66 compression is not even attempted even if enabled. Defaults to one
67 kilobyte (1024 bytes). Set this to 0 and "compress" to a
68 non-"SRL_UNCOMPRESSED" value to always attempt to compress. Note that
69 the document will not be compressed if the resulting size will be
70 bigger than the original size (even if "compress_threshold" is 0).
71
72 compress_level
73
74 If Zlib or Zstd compressions are used, then this option will set a
75 compression level: Zlib uses range from 1 (fastest) to 9 (best).
76 Defaults to 6. Zstd uses range from 1 (fastest) to 22 (best). Default
77 is 3.
78
79 snappy
80
81 See also the "compress" option. This option is provided only for
82 compatibility with Sereal V1.
83
84 If set, the main payload of the Sereal document will be compressed
85 using Google's Snappy algorithm. This can yield anywhere from no effect
86 to significant savings on output size at rather low run time cost. If
87 in doubt, test with your data whether this helps or not.
88
89 The decoder (version 0.04 and up) will know how to handle Snappy-
90 compressed Sereal documents transparently.
91
92 Note: The "snappy_incr" and "snappy" options are identical in Sereal
93 protocol v2 and up (so by default). If using an older protocol version
94 (see "protocol_version" and "use_protocol_v1" options below) to emit
95 Sereal V1 documents, this emits non-incrementally decodable documents.
96 See "snappy_incr" in those cases.
97
98 snappy_incr
99
100 See also the "compress" option. This option is provided only for
101 compatibility with Sereal V1.
102
103 Same as the "snappy" option for default operation (that is in Sereal v2
104 or up).
105
106 In Sereal V1, enables a version of the Snappy protocol which is
107 suitable for incremental parsing of packets. See also the "snappy"
108 option above for more details.
109
110 snappy_threshold
111
112 See also the "compress" option. This option is provided only for
113 compatibility with Sereal V1.
114
115 This option is a synonym for the "compress_threshold" option, but only
116 if Snappy compression is enabled.
117
118 croak_on_bless
119
120 If this option is set, then the encoder will refuse to serialize
121 blessed references and throw an exception instead.
122
123 This can be important because blessed references can mean executing a
124 destructor on a remote system or generally executing code based on
125 data.
126
127 See also "no_bless_objects" to skip the blessing of objects. When both
128 flags are set, "croak_on_bless" has a higher precedence then
129 "no_bless_objects".
130
131 freeze_callbacks
132
133 This option was introduced in Sereal v2 and needs a Sereal v2 decoder.
134
135 If this option is set, the encoder will check for and possibly invoke
136 the "FREEZE" method on any object in the input data. An object that was
137 serialized using its "FREEZE" method will have its corresponding "THAW"
138 class method called during deserialization. The exact semantics are
139 documented below under "FREEZE/THAW CALLBACK MECHANISM".
140
141 Beware that using this functionality means a significant slowdown for
142 object serialization. Even when serializing objects without a "FREEZE"
143 method, the additional method look up will cost a small amount of
144 runtime. Yes, "Sereal::Encoder" is so fast that this may make a
145 difference.
146
147 no_bless_objects
148
149 If this option is set, then the encoder will serialize blessed
150 references without the bless information and provide plain data
151 structures instead.
152
153 See also the "croak_on_bless" option above for more details.
154
155 undef_unknown
156
157 If set, unknown/unsupported data structures will be encoded as "undef"
158 instead of throwing an exception.
159
160 Mutually exclusive with "stringify_unknown". See also "warn_unknown"
161 below.
162
163 stringify_unknown
164
165 If set, unknown/unsupported data structures will be stringified and
166 encoded as that string instead of throwing an exception. The
167 stringification may cause a warning to be emitted by perl.
168
169 Mutually exclusive with "undef_unknown". See also "warn_unknown"
170 below.
171
172 warn_unknown
173
174 Only has an effect if "undef_unknown" or "stringify_unknown" are
175 enabled.
176
177 If set to a positive integer, any unknown/unsupported data structure
178 encountered will emit a warning. If set to a negative integer, it will
179 warn for unsupported data structures just the same as for a positive
180 value with one exception: For blessed, unsupported items that have
181 string overloading, we silently stringify without warning.
182
183 max_recursion_depth
184
185 "Sereal::Encoder" is recursive. If you pass it a Perl data structure
186 that is deeply nested, it will eventually exhaust the C stack.
187 Therefore, there is a limit on the depth of recursion that is accepted.
188 It defaults to 10000 nested calls. You may choose to override this
189 value with the "max_recursion_depth" option. Beware that setting it too
190 high can cause hard crashes, so only do that if you KNOW that it is
191 safe to do so.
192
193 Do note that the setting is somewhat approximate. Setting it to 10000
194 may break at somewhere between 9997 and 10003 nested structures
195 depending on their types.
196
197 canonical
198
199 Enable all options which are related to producing canonical output, so
200 that two strucutures with similar contents produce the same serialized
201 form.
202
203 See the caveats elsewhere in this document about producing canonical
204 output.
205
206 Currently sets the default for the following parameters:
207 "canonical_refs" and "sort_keys". If the option is explicitly set then
208 this setting is ignored. More options may be added in the future.
209
210 You are warned that use of this option may incur additional performance
211 penalties in a future release by enabling other options than those
212 listed here.
213
214 canonical_refs
215
216 Normally "Sereal::Encoder" will ARRAYREF and HASHREF tags when the item
217 contains less than 16 items, and and is not referenced more than once.
218 This flag will override this optimization and use a standard REFN ARRAY
219 style tag output. This is primarily useful for producing canonical
220 output and for testing Sereal itself.
221
222 See "CANONICAL REPRESENTATION" for why you might want to use this, and
223 for the various caveats involved.
224
225 sort_keys
226
227 Normally "Sereal::Encoder" will output hashes in whatever order is
228 convenient, generally that used by perl to actually store the hash, or
229 whatever order was returned by a tied hash.
230
231 If this option is enabled then the Encoder will sort the keys before
232 outputting them. It uses more memory, and is quite a bit slower than
233 the default.
234
235 Generally speaking this should mean that a hash and a copy should
236 produce the same output. Nevertheless the user is warned that Perl has
237 a way of "morphing" variables on use, and some of its rules are a
238 little arcane (for instance utf8 keys), and so two hashes that might
239 appear to be the same might still produce different output as far as
240 Sereal is concerned.
241
242 As of 3.006_007 (prerelease candidate for 3.007) the sort order has
243 been changed to the following: order by length of keys (in bytes)
244 ascending, then by byte order of the raw underlying string, then by
245 utf8ness, with non-utf8 first. This order was chosen because it is the
246 most efficient to implement, both in terms of memory and time. This
247 sort order is enabled when sort_keys is set to 1.
248
249 You may also produce output in Perl "cmp" order, by setting sort_keys
250 to 2. And for backwards compatibility you may also produce output in
251 reverse Perl "cmp" order by setting sort_keys to 3. Prior to 3.006_007
252 this was the only sort order possible, although it was not explicitly
253 defined what it was.
254
255 Note that comparatively speaking both of the "cmp" sort orders are slow
256 and memory inefficient. Unless you have a really good reason stick to
257 the default which is fast and as lean as possible.
258
259 Unless you are concerned with "cross process canonical representation"
260 then it doesn't matter what option you choose.
261
262 See "CANONICAL REPRESENTATION" for why you might want to use this, and
263 for the various caveats involved.
264
265 no_shared_hashkeys
266
267 When the "no_shared_hashkeys" option is set to a true value, then the
268 encoder will disable the detection and elimination of repeated hash
269 keys. This only has an effect for serializing structures containing
270 hashes. By skipping the detection of repeated hash keys, performance
271 goes up a bit, but the size of the output can potentially be much
272 larger.
273
274 Do not disable this unless you have a reason to.
275
276 dedupe_strings
277
278 If this is option is enabled/true then Sereal will use a hash to encode
279 duplicates of strings during serialization efficiently using (internal)
280 backreferences. This has a performance and memory penalty during
281 encoding so it defaults to off. On the other hand, data structures
282 with many duplicated strings will see a significant reduction in the
283 size of the encoded form. Currently only strings longer than 3
284 characters will be deduped, however this may change in the future.
285
286 Note that Sereal will perform certain types of deduping automatically
287 even without this option. In particular class names and hash keys (see
288 also the "no_shared_hashkeys" setting) are deduped regardless of this
289 option. Only enable this if you have good reason to believe that there
290 are many duplicated strings as values in your data structure.
291
292 Use of this option does not require an upgraded decoder (this option
293 was added in Sereal::Encoder 0.32). The deduping is performed in such a
294 way that older decoders should handle it just fine. In other words,
295 the output of a Sereal decoder should not depend on whether this option
296 was used during encoding. See also below: aliased_dedupe_strings.
297
298 aliased_dedupe_strings
299
300 This is an advanced option that should be used only after fully
301 understanding its ramifications.
302
303 This option enables a mode of operation that is similar to
304 dedupe_strings and if both options are set, aliased_dedupe_strings
305 takes precedence.
306
307 The behaviour of aliased_dedupe_strings differs from dedupe_strings in
308 that the duplicate occurrences of strings are emitted as Perl language
309 level aliases instead of as Sereal-internal backreferences. This means
310 that using this option actually produces a different output data
311 structure when decoding. The upshot is that with this option, the
312 application using (decoding) the data may save a lot of memory in some
313 situations but at the cost of potential action at a distance due to the
314 aliasing.
315
316 Beware: The test suite currently does not cover this option as well as
317 it probably should. Patches welcome.
318
319 protocol_version
320
321 Specifies the version of the Sereal protocol to emit. Valid are
322 integers between 1 and the current version. If not specified, the most
323 recent protocol version will be used. See also "use_protocol_v1":
324
325 It is strongly advised to use the latest protocol version outside of
326 migration periods.
327
328 use_protocol_v1
329
330 This option is deprecated in favour of the "protocol_version" option
331 (see above).
332
333 If set, the encoder will emit Sereal documents following protocol
334 version 1. This is strongly discouraged except for temporary
335 compatibility/migration purposes.
336
338 encode
339 Given a Perl data structure, serializes that data structure and returns
340 a binary string that can be turned back into the original data
341 structure by Sereal::Decoder. The method expects a data structure to
342 serialize as first argument, optionally followed by a header data
343 structure.
344
345 A header is intended for embedding small amounts of meta data, such as
346 routing information, in a document that allows users to avoid
347 deserializing main body needlessly.
348
349 encode_to_file
350 Sereal::Encoder->encode_to_file($file,$data,$append);
351 $encoder->encode_to_file($file,$data,$append);
352
353 Encode the data specified and write it the named file. If $append is
354 true then the written data is appended to any existing data, otherwise
355 any existing data will be overwritten. Dies if any errors occur during
356 writing the encoded data.
357
359 sereal_encode_with_object
360 The functional interface that is equivalent to using "encode". Takes an
361 encoder object reference as first argument, followed by a data
362 structure and optional header to serialize.
363
364 This functional interface is marginally faster than the OO interface
365 since it avoids method resolution overhead and, on sufficiently modern
366 Perl versions, can usually avoid subroutine call overhead.
367
368 encode_sereal
369 The functional interface that is equivalent to using "new" and
370 "encode". Expects a data structure to serialize as first argument,
371 optionally followed by a hash reference of options (see documentation
372 for "new()").
373
374 This function cannot be used for encoding a data structure with a
375 header. See "encode_sereal_with_header_data".
376
377 This functional interface is significantly slower than the OO interface
378 since it cannot reuse the encoder object.
379
380 encode_sereal_with_header_data
381 The functional interface that is equivalent to using "new" and
382 "encode". Expects a data structure and a header to serialize as first
383 and second arguments, optionally followed by a hash reference of
384 options (see documentation for "new()").
385
386 This functional interface is significantly slower than the OO interface
387 since it cannot reuse the encoder object.
388
390 See Sereal::Performance for detailed considerations on performance
391 tuning. Let it just be said that:
392
393 If you care about performance at all, then use
394 "sereal_encode_with_object" or the OO interface instead of
395 "encode_sereal". It's a significant difference in performance if you
396 are serializing small data structures.
397
398 The exact performance in time and space depends heavily on the data
399 structure to be serialized. Often there is a trade-off between space
400 and time. If in doubt, do your own testing and most importantly ALWAYS
401 TEST WITH REAL DATA. If you care purely about speed at the expense of
402 output size, you can use the "no_shared_hashkeys" option for a small
403 speed-up. If you need smaller output at the cost of higher CPU load and
404 more memory used during encoding/decoding, try the "dedupe_strings"
405 option and enable Snappy compression.
406
407 For ready-made comparison scripts, see the author_tools/bench.pl and
408 author_tools/dbench.pl programs that are part of this distribution.
409 Suffice to say that this library is easily competitive in both time and
410 space efficiency with the best alternatives.
411
413 This mechanism is enabled using the "freeze_callbacks" option of the
414 encoder. It is inspired by the equivalent mechanism in CBOR::XS and
415 differs only in one minor detail, explained below. The general
416 mechanism is documented in the A GENERIC OBJECT SERIALIATION PROTOCOL
417 section of Types::Serialiser. Similar to CBOR using "CBOR", Sereal
418 uses the string "Sereal" as a serializer identifier for the callbacks.
419
420 The one difference to the mechanism as supported by CBOR is that in
421 Sereal, the "FREEZE" callback must return a single value. That value
422 can be any data structure supported by Sereal (hopefully without
423 causing infinite recursion by including the original object). But
424 "FREEZE" can't return a list as with CBOR. This should not be any
425 practical limitation whatsoever. Just return an array reference instead
426 of a list.
427
428 Here is a contrived example of a class implementing the "FREEZE" /
429 "THAW" mechanism.
430
431 package
432 File;
433
434 use Moo;
435
436 has 'path' => (is => 'ro');
437 has 'fh' => (is => 'rw');
438
439 # open file handle if necessary and return it
440 sub get_fh {
441 my $self = shift;
442 # This could also be done with fancier Moo(se) syntax
443 my $fh = $self->fh;
444 if (not $fh) {
445 open $fh, "<", $self->path or die $!;
446 $self->fh($fh);
447 }
448 return $fh;
449 }
450
451 sub FREEZE {
452 my ($self, $serializer) = @_;
453 # Could switch on $serializer here: JSON, CBOR, Sereal, ...
454 # But this case is so simple that it will work with ALL of them.
455 # Do not try to serialize our file handle! Path will be enough
456 # to recreate.
457 return $self->path;
458 }
459
460 sub THAW {
461 my ($class, $serializer, $data) = @_;
462 # Turn back into object.
463 return $class->new(path => $data);
464 }
465
466 Why is the "FREEZE"/"THAW" mechanism important here? Our contrived
467 "File" class may contain a file handle which can't be serialized. So
468 "FREEZE" not only returns just the path (which is more compact than
469 encoding the actual object contents), but it strips the file handle
470 which can be lazily reopened on the other side of the
471 serialization/deserialization pipe. But this example also shows that a
472 naive implementation can easily end up with subtle bugs. A file handle
473 itself has state (position in file, etc). Thus the deserialization in
474 the above example won't accurately reproduce the original state. It
475 can't, of course, if it's deserialized in a different environment
476 anyway.
477
479 "Sereal::Encoder" is thread-safe on Perl's 5.8.7 and higher. This means
480 "thread-safe" in the sense that if you create a new thread, all
481 "Sereal::Encoder" objects will become a reference to undef in the new
482 thread. This might change in a future release to become a full clone of
483 the encoder object.
484
486 You might want to compare two data structures by comparing their
487 serialized byte strings. For that to work reliably the serialization
488 must take extra steps to ensure that identical data structures are
489 encoded into identical serialized byte strings (a so-called "canonical
490 representation").
491
492 Unfortunately in Perl there is no such thing as a "canonical
493 representation". Most people are interested in "structural
494 equivalence" but even that is less well defined than most people think.
495 For instance in the following example:
496
497 my $array1= [ 0, 0 ];
498 my $array2= do {
499 my $zero= 0;
500 sub{ \@_ }->($zero,$zero);
501 };
502
503 the question of whether $array1 is structurally equivalent to $array2
504 is a subjective one. Sereal for instance would NOT consider them
505 equivalent but "Test::Deep" would. There are many examples of this in
506 Perl. Simply stringifying a number technically changes the scalar.
507 Storable would notice this, but Sereal generally would not.
508
509 Despite this as of 3.002 the Sereal encoder supports a "canonical"
510 option which will make a "best effort" attempt at producing a canonical
511 representation of a data structure. This mode is actually a
512 combination of several other modes which may also be enabled
513 independently, and as and when we add new options to the encoder that
514 would assist in this regard then the "canonical" will also enable them.
515 These options may come with a performance penalty so care should be
516 taken to read the Changes file and test the performance implications
517 when upgrading a system that uses this option.
518
519 It is important to note that using canonical representation to
520 determine if two data structures are different is subject to false-
521 positives. If two Sereal encodings are identical you can generally
522 assume that the two data structures are functionally equivalent from
523 the point of view of normal Perl code (XS code might disagree). However
524 if two Sereal encodings differ the data structures may actually be
525 functionally equivalent. In practice it seems the the false-positive
526 rate is low, but your milage may vary.
527
528 Some of the issues with producing a true canonical representation are
529 outlined below:
530
531 Sereal doesn't order the hash keys by default.
532 This can be enabled via the "sort_keys", which is itself enabled by
533 "canonical" option.
534
535 Sereal output is sensitive to refcounts
536 This can be somewhat mitigated by the use of "canonical_refs", see
537 above.
538
539 There are multiple valid Sereal documents that you can produce for the
540 same Perl data structure.
541 Just sorting hash keys is not enough. Some of the reasons are
542 outlined below. These issues are especially relevant when
543 considering language interoperability.
544
545 PAD bytes
546 A trivial example is PAD bytes which mean nothing and are
547 skipped. They mostly exist for encoder optimizations to prevent
548 certain nasty backtracking situations from becoming O(n) at the
549 cost of one byte of output. An explicit canonical mode would
550 have to outlaw them (or add more of them) and thus require a
551 much more complicated implementation of refcount/weakref
552 handing in the encoder while at the same time causing some
553 operations to go from O(1) to a full memcpy of everything after
554 the point of where we backtracked to. Nasty.
555
556 COPY tag
557 Another example is COPY. The COPY tag indicates that the next
558 element is an identical copy of a previous element (which is
559 itself forbidden from including COPY's other than for class
560 names). COPY is purely internal. The Perl/XS implementation
561 uses it to share hash keys and class names. One could use it
562 for other strings (theoretically), but doesn't for time-
563 efficiency reasons. We'd have to outlaw the use of this
564 (significant) optimization of canonicalization.
565
566 REF representation
567 Sereal represents a reference to an array as a sequence of tags
568 which, in its simplest form, reads REF, ARRAY $array_length
569 TAG1 TAG2 .... The separation of "REF" and "ARRAY" is
570 necessary to properly implement all of Perl's referencing and
571 aliasing semantics correctly. Quite frequently, however, your
572 array is only referenced once and plainly so. If it's also at
573 most 15 elements long, Sereal optimizes all of the "REF" and
574 "ARRAY" tags, as well as the length into a special one byte
575 ARRAYREF tag. This is a very significant optimization for
576 common cases. This, however, does mean that most arrays up to
577 15 elements could be represented in two different, yet
578 perfectly valid forms. ARRAYREF would have to be outlawed for a
579 properly canonical form. The exact same logic applies to HASH
580 vs. HASHREF. This behavior can be overridden by the
581 "canonical_refs" option, which disables use of HASHREF and
582 ARRAYREF.
583
584 Numeric representation
585 Similar to how Sereal can represent arrays and hashes in a full
586 and a compact form. For small integers (between -16 and +15
587 inclusive), Sereal emits only one byte including the encoding
588 of the type of data. For larger integers, it can use either
589 variants (positive only) or zigzag encoding, which can also
590 represent negative numbers. For a canonical mode, the space
591 optimizations would have to be turned off and it would have to
592 be explicitly specified whether variant or zigzag encoding is
593 to be used for encoding positive integers.
594
595 Perl may choose to retain multiple representations of a scalar.
596 Specifically, it can convert integers, floating point numbers,
597 and strings on the fly and will aggressively cache the results.
598 Normally, it remembers which of the representations can be
599 considered canonical, that means, which can be used to recreate
600 the others reliably. For example, 0 and "0" can both be
601 considered canonical since they naturally transform into each
602 other. Beyond intrinsic ambiguity, there are ways to trick Perl
603 into allowing a single scalar to have distinct string, integer,
604 and floating point representations that are all flagged as
605 canonical, but can't be transformed into each other. These are
606 the so-called dualvars. Sereal cannot represent dualvars (and
607 that's a good thing).
608
609 Floating point values can appear to be the same but serialize
610 to different byte strings due to insignificant 'noise' in the
611 floating point representation. Sereal supports different
612 floating point precisions and will generally choose the most
613 compact that can represent your floating point number
614 correctly.
615
616 There's also a few cases where Sereal will produce different
617 documents for values that you might think are the same thing,
618 because if you e.g. compared them with "eq" or "==" in perl itself
619 would think they were equivalent. However for the purposes of
620 serialization they're not the same value.
621
622 A good example of these cases is where Test::Deep and Sereal's
623 canonical mode differ. We have tests for some of these cases in
624 t/030_canonical_vs_test_deep.t. Here's the issues we've noticed so
625 far:
626
627 Sereal considers ASCII strings with the UTF-8 flag to be different
628 from the same string without the UTF-8 flag
629 Consider:
630
631 my $language_code = "en";
632
633 v.s.:
634
635 my $language_code = "en";
636 utf8::upgrade($en);
637
638 Sereal's canonical mode will encode these strings differently,
639 as it should, since the UTF-8 flag will be passed along on
640 interpolation.
641
642 But this can be confusing if you're just getting some user-
643 supplied ASCII strings that you may inadvertently toggle the
644 UTF-8 flag on, e.g. because you're comparing an ASCII value in
645 a database to a value submitted in a UTF-8 web form.
646
647 Sereal will encode strings that look like numbers as strings,
648 unless they've been used in numeric context
649 I.e. these values will be encoded differently, respectively:
650
651 my $IV_x = "12345";
652 my $IV_y = "12345" + 0;
653 my $NV_x = "12.345";
654 my $NV_y = "12.345" + 0;
655
656 But as noted above something like Test::Deep will consider
657 these to be the same thing.
658
659 We might produce certain aggressive flags to the canonical mode in
660 the future to deal with this. For the cases noted above some
661 combination of turning the UTF-8 flag on on all strings, or
662 stripping it from strings that have it but are ASCII-only would
663 "work", similarly we could scan strings to see if they match
664 "looks_like_number()" and if so numify them.
665
666 This would produce output that either would be a lot bigger (having
667 to encode all numbers as strings), or would be more expensive to
668 generate (having to scan strings for numeric or non-ASCII context),
669 and for some cases like the UTF-8 flag munging wouldn't be suitable
670 for general use outside of canonicialization.
671
672 Often, people don't actually care about "canonical" in the strict sense
673 required for real identity checking. They just require a best-effort
674 sort of thing for caching. But it's a slippery slope!
675
676 In a nutshell, the "canonical" option may be sufficient for an
677 application which is simply serializing a cache key, and thus there's
678 little harm in an occasional false-negative, but think carefully before
679 applying Sereal in other use-cases.
680
682 Strings Or Numbers
683 Perl does not make a strong distinction between strings and
684 numbers, and from an internal point of view it can be difficult to
685 tell what the "right" representation is for a given variable.
686
687 Sereal tries to not be lossy. So if it detects that the string
688 value of a var, and the numeric value are different it will
689 generally round trip the *string* value. This means that "special"
690 strings often used in Perl function returns, like "0 but true", and
691 "0e0", will round trip in a way that their normal Perl semantics
692 are preserved. However this also means that "non canonical" values,
693 like " 100 ", which will numify as 100 without warnings, will round
694 trip as their string values.
695
696 Perl also has some operators, the binary operators, ^, | and &,
697 which do different things depending on whether their arguments had
698 been used in numeric context as the following examples show:
699
700 perl -le'my $x="1"; $i=int($x); print unpack "H*", $x ^ "1"'
701 30
702
703 perl -le'my $x="1"; print unpack "H*", $x ^ "1"'
704 00
705
706 perl -le'my $x=" 1 "; $i=int($x); print unpack "H*", $x ^ "1"'
707 30
708
709 perl -le'my $x=" 1 "; print unpack "H*", $x ^ "1"'
710 113120
711
712 Sereal currently cannot round trip this property properly.
713
714 An extreme case of this problem is that of "dualvars", which can be
715 created using the Scalar::Util::dualvar() function. This function
716 allows one to create variables which have string and integer values
717 which are completely unrelated to each other. Sereal currently
718 will choose the *string* value when it detects these items.
719
720 It is possible that a future release of the protocol will fix these
721 issues.
722
724 For reporting bugs, please use the github bug tracker at
725 <http://github.com/Sereal/Sereal/issues>.
726
727 For support and discussion of Sereal, there are two Google Groups:
728
729 Announcements around Sereal (extremely low volume):
730 <https://groups.google.com/forum/?fromgroups#!forum/sereal-announce>
731
732 Sereal development list:
733 <https://groups.google.com/forum/?fromgroups#!forum/sereal-dev>
734
736 Yves Orton <demerphq@gmail.com>
737
738 Damian Gryski
739
740 Steffen Mueller <smueller@cpan.org>
741
742 Rafaël Garcia-Suarez
743
744 Ævar Arnfjörð Bjarmason <avar@cpan.org>
745
746 Tim Bunce
747
748 Daniel Dragan <bulkdd@cpan.org> (Windows support and bugfixes)
749
750 Zefram
751
752 Borislav Nikolov
753
754 Ivan Kruglov <ivan.kruglov@yahoo.com>
755
756 Some inspiration and code was taken from Marc Lehmann's excellent
757 JSON::XS module due to obvious overlap in problem domain. Thank you!
758
760 This module was originally developed for Booking.com. With approval
761 from Booking.com, this module was generalized and published on CPAN,
762 for which the authors would like to express their gratitude.
763
765 Copyright (C) 2012, 2013, 2014 by Steffen Mueller Copyright (C) 2012,
766 2013, 2014 by Yves Orton
767
768 The license for the code in this distribution is the following, with
769 the exceptions listed below:
770
771 This library is free software; you can redistribute it and/or modify it
772 under the same terms as Perl itself.
773
774 Except portions taken from Marc Lehmann's code for the JSON::XS module,
775 which is licensed under the same terms as this module.
776
777 Also except the code for Snappy compression library, whose license is
778 reproduced below and which, to the best of our knowledge, is compatible
779 with this module's license. The license for the enclosed Snappy code
780 is:
781
782 Copyright 2011, Google Inc.
783 All rights reserved.
784
785 Redistribution and use in source and binary forms, with or without
786 modification, are permitted provided that the following conditions are
787 met:
788
789 * Redistributions of source code must retain the above copyright
790 notice, this list of conditions and the following disclaimer.
791 * Redistributions in binary form must reproduce the above
792 copyright notice, this list of conditions and the following disclaimer
793 in the documentation and/or other materials provided with the
794 distribution.
795 * Neither the name of Google Inc. nor the names of its
796 contributors may be used to endorse or promote products derived from
797 this software without specific prior written permission.
798
799 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
800 "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
801 LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
802 A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
803 OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
804 SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
805 LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
806 DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
807 THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
808 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
809 OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
810
811
812
813perl v5.32.1 2021-01-27 Sereal::Encoder(3)