1XS(3)                 User Contributed Perl Documentation                XS(3)
2
3
4

NAME

6       CBOR::XS - Concise Binary Object Representation (CBOR, RFC7049)
7

SYNOPSIS

9        use CBOR::XS;
10
11        $binary_cbor_data = encode_cbor $perl_value;
12        $perl_value       = decode_cbor $binary_cbor_data;
13
14        # OO-interface
15
16        $coder = CBOR::XS->new;
17        $binary_cbor_data = $coder->encode ($perl_value);
18        $perl_value       = $coder->decode ($binary_cbor_data);
19
20        # prefix decoding
21
22        my $many_cbor_strings = ...;
23        while (length $many_cbor_strings) {
24           my ($data, $length) = $cbor->decode_prefix ($many_cbor_strings);
25           # data was decoded
26           substr $many_cbor_strings, 0, $length, ""; # remove decoded cbor string
27        }
28

DESCRIPTION

30       This module converts Perl data structures to the Concise Binary Object
31       Representation (CBOR) and vice versa. CBOR is a fast binary
32       serialisation format that aims to use an (almost) superset of the JSON
33       data model, i.e.  when you can represent something useful in JSON, you
34       should be able to represent it in CBOR.
35
36       In short, CBOR is a faster and quite compact binary alternative to
37       JSON, with the added ability of supporting serialisation of Perl
38       objects. (JSON often compresses better than CBOR though, so if you plan
39       to compress the data later and speed is less important you might want
40       to compare both formats first).
41
42       To give you a general idea about speed, with texts in the megabyte
43       range, "CBOR::XS" usually encodes roughly twice as fast as Storable or
44       JSON::XS and decodes about 15%-30% faster than those. The shorter the
45       data, the worse Storable performs in comparison.
46
47       Regarding compactness, "CBOR::XS"-encoded data structures are usually
48       about 20% smaller than the same data encoded as (compact) JSON or
49       Storable.
50
51       In addition to the core CBOR data format, this module implements a
52       number of extensions, to support cyclic and shared data structures (see
53       "allow_sharing" and "allow_cycles"), string deduplication (see
54       "pack_strings") and scalar references (always enabled).
55
56       The primary goal of this module is to be correct and the secondary goal
57       is to be fast. To reach the latter goal it was written in C.
58
59       See MAPPING, below, on how CBOR::XS maps perl values to CBOR values and
60       vice versa.
61

FUNCTIONAL INTERFACE

63       The following convenience methods are provided by this module. They are
64       exported by default:
65
66       $cbor_data = encode_cbor $perl_scalar
67           Converts the given Perl data structure to CBOR representation.
68           Croaks on error.
69
70       $perl_scalar = decode_cbor $cbor_data
71           The opposite of "encode_cbor": expects a valid CBOR string to
72           parse, returning the resulting perl scalar. Croaks on error.
73

OBJECT-ORIENTED INTERFACE

75       The object oriented interface lets you configure your own encoding or
76       decoding style, within the limits of supported formats.
77
78       $cbor = new CBOR::XS
79           Creates a new CBOR::XS object that can be used to de/encode CBOR
80           strings. All boolean flags described below are by default disabled.
81
82           The mutators for flags all return the CBOR object again and thus
83           calls can be chained:
84
85              my $cbor = CBOR::XS->new->encode ({a => [1,2]});
86
87       $cbor = new_safe CBOR::XS
88           Create a new, safe/secure CBOR::XS object. This is similar to
89           "new", but configures the coder object to be safe to use with
90           untrusted data. Currently, this is equivalent to:
91
92              my $cbor = CBOR::XS
93                 ->new
94                 ->forbid_objects
95                 ->filter (\&CBOR::XS::safe_filter)
96                 ->max_size (1e8);
97
98           But is more future proof (it is better to crash because of a change
99           than to be exploited in other ways).
100
101       $cbor = $cbor->max_depth ([$maximum_nesting_depth])
102       $max_depth = $cbor->get_max_depth
103           Sets the maximum nesting level (default 512) accepted while
104           encoding or decoding. If a higher nesting level is detected in CBOR
105           data or a Perl data structure, then the encoder and decoder will
106           stop and croak at that point.
107
108           Nesting level is defined by number of hash- or arrayrefs that the
109           encoder needs to traverse to reach a given point or the number of
110           "{" or "[" characters without their matching closing parenthesis
111           crossed to reach a given character in a string.
112
113           Setting the maximum depth to one disallows any nesting, so that
114           ensures that the object is only a single hash/object or array.
115
116           If no argument is given, the highest possible setting will be used,
117           which is rarely useful.
118
119           Note that nesting is implemented by recursion in C. The default
120           value has been chosen to be as large as typical operating systems
121           allow without crashing.
122
123           See "SECURITY CONSIDERATIONS", below, for more info on why this is
124           useful.
125
126       $cbor = $cbor->max_size ([$maximum_string_size])
127       $max_size = $cbor->get_max_size
128           Set the maximum length a CBOR string may have (in bytes) where
129           decoding is being attempted. The default is 0, meaning no limit.
130           When "decode" is called on a string that is longer then this many
131           bytes, it will not attempt to decode the string but throw an
132           exception. This setting has no effect on "encode" (yet).
133
134           If no argument is given, the limit check will be deactivated (same
135           as when 0 is specified).
136
137           See "SECURITY CONSIDERATIONS", below, for more info on why this is
138           useful.
139
140       $cbor = $cbor->allow_unknown ([$enable])
141       $enabled = $cbor->get_allow_unknown
142           If $enable is true (or missing), then "encode" will not throw an
143           exception when it encounters values it cannot represent in CBOR
144           (for example, filehandles) but instead will encode a CBOR "error"
145           value.
146
147           If $enable is false (the default), then "encode" will throw an
148           exception when it encounters anything it cannot encode as CBOR.
149
150           This option does not affect "decode" in any way, and it is
151           recommended to leave it off unless you know your communications
152           partner.
153
154       $cbor = $cbor->allow_sharing ([$enable])
155       $enabled = $cbor->get_allow_sharing
156           If $enable is true (or missing), then "encode" will not double-
157           encode values that have been referenced before (e.g. when the same
158           object, such as an array, is referenced multiple times), but
159           instead will emit a reference to the earlier value.
160
161           This means that such values will only be encoded once, and will not
162           result in a deep cloning of the value on decode, in decoders
163           supporting the value sharing extension. This also makes it possible
164           to encode cyclic data structures (which need "allow_cycles" to be
165           enabled to be decoded by this module).
166
167           It is recommended to leave it off unless you know your
168           communication partner supports the value sharing extensions to CBOR
169           (<http://cbor.schmorp.de/value-sharing>), as without decoder
170           support, the resulting data structure might be unusable.
171
172           Detecting shared values incurs a runtime overhead when values are
173           encoded that have a reference counter large than one, and might
174           unnecessarily increase the encoded size, as potentially shared
175           values are encode as shareable whether or not they are actually
176           shared.
177
178           At the moment, only targets of references can be shared (e.g.
179           scalars, arrays or hashes pointed to by a reference). Weirder
180           constructs, such as an array with multiple "copies" of the same
181           string, which are hard but not impossible to create in Perl, are
182           not supported (this is the same as with Storable).
183
184           If $enable is false (the default), then "encode" will encode shared
185           data structures repeatedly, unsharing them in the process. Cyclic
186           data structures cannot be encoded in this mode.
187
188           This option does not affect "decode" in any way - shared values and
189           references will always be decoded properly if present.
190
191       $cbor = $cbor->allow_cycles ([$enable])
192       $enabled = $cbor->get_allow_cycles
193           If $enable is true (or missing), then "decode" will happily decode
194           self-referential (cyclic) data structures. By default these will
195           not be decoded, as they need manual cleanup to avoid memory leaks,
196           so code that isn't prepared for this will not leak memory.
197
198           If $enable is false (the default), then "decode" will throw an
199           error when it encounters a self-referential/cyclic data structure.
200
201           FUTURE DIRECTION: the motivation behind this option is to avoid
202           real cycles - future versions of this module might chose to decode
203           cyclic data structures using weak references when this option is
204           off, instead of throwing an error.
205
206           This option does not affect "encode" in any way - shared values and
207           references will always be encoded properly if present.
208
209       $cbor = $cbor->forbid_objects ([$enable])
210       $enabled = $cbor->get_forbid_objects
211           Disables the use of the object serialiser protocol.
212
213           If $enable is true (or missing), then "encode" will will throw an
214           exception when it encounters perl objects that would be encoded
215           using the perl-object tag (26). When "decode" encounters such tags,
216           it will fall back to the general filter/tagged logic as if this
217           were an unknown tag (by default resulting in a "CBOR::XC::Tagged"
218           object).
219
220           If $enable is false (the default), then "encode" will use the
221           Types::Serialiser object serialisation protocol to serialise
222           objects into perl-object tags, and "decode" will do the same to
223           decode such tags.
224
225           See "SECURITY CONSIDERATIONS", below, for more info on why
226           forbidding this protocol can be useful.
227
228       $cbor = $cbor->pack_strings ([$enable])
229       $enabled = $cbor->get_pack_strings
230           If $enable is true (or missing), then "encode" will try not to
231           encode the same string twice, but will instead encode a reference
232           to the string instead. Depending on your data format, this can save
233           a lot of space, but also results in a very large runtime overhead
234           (expect encoding times to be 2-4 times as high as without).
235
236           It is recommended to leave it off unless you know your
237           communications partner supports the stringref extension to CBOR
238           (<http://cbor.schmorp.de/stringref>), as without decoder support,
239           the resulting data structure might not be usable.
240
241           If $enable is false (the default), then "encode" will encode
242           strings the standard CBOR way.
243
244           This option does not affect "decode" in any way - string references
245           will always be decoded properly if present.
246
247       $cbor = $cbor->text_keys ([$enable])
248       $enabled = $cbor->get_text_keys
249           If $enabled is true (or missing), then "encode" will encode all
250           perl hash keys as CBOR text strings/UTF-8 string, upgrading them as
251           needed.
252
253           If $enable is false (the default), then "encode" will encode hash
254           keys normally - upgraded perl strings (strings internally encoded
255           as UTF-8) as CBOR text strings, and downgraded perl strings as CBOR
256           byte strings.
257
258           This option does not affect "decode" in any way.
259
260           This option is useful for interoperability with CBOR decoders that
261           don't treat byte strings as a form of text. It is especially useful
262           as Perl gives very little control over hash keys.
263
264           Enabling this option can be slow, as all downgraded hash keys that
265           are encoded need to be scanned and converted to UTF-8.
266
267       $cbor = $cbor->text_strings ([$enable])
268       $enabled = $cbor->get_text_strings
269           This option works similar to "text_keys", above, but works on all
270           strings (including hash keys), so "text_keys" has no further effect
271           after enabling "text_strings".
272
273           If $enabled is true (or missing), then "encode" will encode all
274           perl strings as CBOR text strings/UTF-8 strings, upgrading them as
275           needed.
276
277           If $enable is false (the default), then "encode" will encode
278           strings normally (but see "text_keys") - upgraded perl strings
279           (strings internally encoded as UTF-8) as CBOR text strings, and
280           downgraded perl strings as CBOR byte strings.
281
282           This option does not affect "decode" in any way.
283
284           This option has similar advantages and disadvantages as
285           "text_keys". In addition, this option effectively removes the
286           ability to encode byte strings, which might break some "FREEZE" and
287           "TO_CBOR" methods that rely on this, such as bignum encoding, so
288           this option is mainly useful for very simple data.
289
290       $cbor = $cbor->validate_utf8 ([$enable])
291       $enabled = $cbor->get_validate_utf8
292           If $enable is true (or missing), then "decode" will validate that
293           elements (text strings) containing UTF-8 data in fact contain valid
294           UTF-8 data (instead of blindly accepting it). This validation
295           obviously takes extra time during decoding.
296
297           The concept of "valid UTF-8" used is perl's concept, which is a
298           superset of the official UTF-8.
299
300           If $enable is false (the default), then "decode" will blindly
301           accept UTF-8 data, marking them as valid UTF-8 in the resulting
302           data structure regardless of whether that's true or not.
303
304           Perl isn't too happy about corrupted UTF-8 in strings, but should
305           generally not crash or do similarly evil things. Extensions might
306           be not so forgiving, so it's recommended to turn on this setting if
307           you receive untrusted CBOR.
308
309           This option does not affect "encode" in any way - strings that are
310           supposedly valid UTF-8 will simply be dumped into the resulting
311           CBOR string without checking whether that is, in fact, true or not.
312
313       $cbor = $cbor->filter ([$cb->($tag, $value)])
314       $cb_or_undef = $cbor->get_filter
315           Sets or replaces the tagged value decoding filter (when $cb is
316           specified) or clears the filter (if no argument or "undef" is
317           provided).
318
319           The filter callback is called only during decoding, when a non-
320           enforced tagged value has been decoded (see "TAG HANDLING AND
321           EXTENSIONS" for a list of enforced tags). For specific tags, it's
322           often better to provide a default converter using the
323           %CBOR::XS::FILTER hash (see below).
324
325           The first argument is the numerical tag, the second is the
326           (decoded) value that has been tagged.
327
328           The filter function should return either exactly one value, which
329           will replace the tagged value in the decoded data structure, or no
330           values, which will result in default handling, which currently
331           means the decoder creates a "CBOR::XS::Tagged" object to hold the
332           tag and the value.
333
334           When the filter is cleared (the default state), the default filter
335           function, "CBOR::XS::default_filter", is used. This function simply
336           looks up the tag in the %CBOR::XS::FILTER hash. If an entry exists
337           it must be a code reference that is called with tag and value, and
338           is responsible for decoding the value. If no entry exists, it
339           returns no values. "CBOR::XS" provides a number of default filter
340           functions already, the the %CBOR::XS::FILTER hash can be freely
341           extended with more.
342
343           "CBOR::XS" additionally provides an alternative filter function
344           that is supposed to be safe to use with untrusted data (which the
345           default filter might not), called "CBOR::XS::safe_filter", which
346           works the same as the "default_filter" but uses the
347           %CBOR::XS::SAFE_FILTER variable instead. It is prepopulated with
348           the tag decoding functions that are deemed safe (basically the same
349           as %CBOR::XS::FILTER without all the bignum tags), and can be
350           extended by user code as wlel, although, obviously, one should be
351           very careful about adding decoding functions here, since the
352           expectation is that they are safe to use on untrusted data, after
353           all.
354
355           Example: decode all tags not handled internally into
356           "CBOR::XS::Tagged" objects, with no other special handling (useful
357           when working with potentially "unsafe" CBOR data).
358
359              CBOR::XS->new->filter (sub { })->decode ($cbor_data);
360
361           Example: provide a global filter for tag 1347375694, converting the
362           value into some string form.
363
364              $CBOR::XS::FILTER{1347375694} = sub {
365                 my ($tag, $value);
366
367                 "tag 1347375694 value $value"
368              };
369
370           Example: provide your own filter function that looks up tags in
371           your own hash:
372
373              my %my_filter = (
374                 998347484 => sub {
375                    my ($tag, $value);
376
377                    "tag 998347484 value $value"
378                 };
379              );
380
381              my $coder = CBOR::XS->new->filter (sub {
382                 &{ $my_filter{$_[0]} or return }
383              });
384
385           Example: use the safe filter function (see "SECURITY
386           CONSIDERATIONS" for more considerations on security).
387
388              CBOR::XS->new->filter (\&CBOR::XS::safe_filter)->decode ($cbor_data);
389
390       $cbor_data = $cbor->encode ($perl_scalar)
391           Converts the given Perl data structure (a scalar value) to its CBOR
392           representation.
393
394       $perl_scalar = $cbor->decode ($cbor_data)
395           The opposite of "encode": expects CBOR data and tries to parse it,
396           returning the resulting simple scalar or reference. Croaks on
397           error.
398
399       ($perl_scalar, $octets) = $cbor->decode_prefix ($cbor_data)
400           This works like the "decode" method, but instead of raising an
401           exception when there is trailing garbage after the CBOR string, it
402           will silently stop parsing there and return the number of
403           characters consumed so far.
404
405           This is useful if your CBOR texts are not delimited by an outer
406           protocol and you need to know where the first CBOR string ends amd
407           the next one starts.
408
409              CBOR::XS->new->decode_prefix ("......")
410              => ("...", 3)
411
412   INCREMENTAL PARSING
413       In some cases, there is the need for incremental parsing of JSON texts.
414       While this module always has to keep both CBOR text and resulting Perl
415       data structure in memory at one time, it does allow you to parse a CBOR
416       stream incrementally, using a similar to using "decode_prefix" to see
417       if a full CBOR object is available, but is much more efficient.
418
419       It basically works by parsing as much of a CBOR string as possible - if
420       the CBOR data is not complete yet, the pasrer will remember where it
421       was, to be able to restart when more data has been accumulated. Once
422       enough data is available to either decode a complete CBOR value or
423       raise an error, a real decode will be attempted.
424
425       A typical use case would be a network protocol that consists of sending
426       and receiving CBOR-encoded messages. The solution that works with CBOR
427       and about anything else is by prepending a length to every CBOR value,
428       so the receiver knows how many octets to read. More compact (and
429       slightly slower) would be to just send CBOR values back-to-back, as
430       "CBOR::XS" knows where a CBOR value ends, and doesn't need an explicit
431       length.
432
433       The following methods help with this:
434
435       @decoded = $cbor->incr_parse ($buffer)
436           This method attempts to decode exactly one CBOR value from the
437           beginning of the given $buffer. The value is removed from the
438           $buffer on success. When $buffer doesn't contain a complete value
439           yet, it returns nothing. Finally, when the $buffer doesn't start
440           with something that could ever be a valid CBOR value, it raises an
441           exception, just as "decode" would. In the latter case the decoder
442           state is undefined and must be reset before being able to parse
443           further.
444
445           This method modifies the $buffer in place. When no CBOR value can
446           be decoded, the decoder stores the current string offset. On the
447           next call, continues decoding at the place where it stopped before.
448           For this to make sense, the $buffer must begin with the same octets
449           as on previous unsuccessful calls.
450
451           You can call this method in scalar context, in which case it either
452           returns a decoded value or "undef". This makes it impossible to
453           distinguish between CBOR null values (which decode to "undef") and
454           an unsuccessful decode, which is often acceptable.
455
456       @decoded = $cbor->incr_parse_multiple ($buffer)
457           Same as "incr_parse", but attempts to decode as many CBOR values as
458           possible in one go, instead of at most one. Calls to "incr_parse"
459           and "incr_parse_multiple" can be interleaved.
460
461       $cbor->incr_reset
462           Resets the incremental decoder. This throws away any saved state,
463           so that subsequent calls to "incr_parse" or "incr_parse_multiple"
464           start to parse a new CBOR value from the beginning of the $buffer
465           again.
466
467           This method can be called at any time, but it must be called if you
468           want to change your $buffer or there was a decoding error and you
469           want to reuse the $cbor object for future incremental parsings.
470

MAPPING

472       This section describes how CBOR::XS maps Perl values to CBOR values and
473       vice versa. These mappings are designed to "do the right thing" in most
474       circumstances automatically, preserving round-tripping characteristics
475       (what you put in comes out as something equivalent).
476
477       For the more enlightened: note that in the following descriptions,
478       lowercase perl refers to the Perl interpreter, while uppercase Perl
479       refers to the abstract Perl language itself.
480
481   CBOR -> PERL
482       integers
483           CBOR integers become (numeric) perl scalars. On perls without 64
484           bit support, 64 bit integers will be truncated or otherwise
485           corrupted.
486
487       byte strings
488           Byte strings will become octet strings in Perl (the Byte values
489           0..255 will simply become characters of the same value in Perl).
490
491       UTF-8 strings
492           UTF-8 strings in CBOR will be decoded, i.e. the UTF-8 octets will
493           be decoded into proper Unicode code points. At the moment, the
494           validity of the UTF-8 octets will not be validated - corrupt input
495           will result in corrupted Perl strings.
496
497       arrays, maps
498           CBOR arrays and CBOR maps will be converted into references to a
499           Perl array or hash, respectively. The keys of the map will be
500           stringified during this process.
501
502       null
503           CBOR null becomes "undef" in Perl.
504
505       true, false, undefined
506           These CBOR values become "Types:Serialiser::true",
507           "Types:Serialiser::false" and "Types::Serialiser::error",
508           respectively. They are overloaded to act almost exactly like the
509           numbers 1 and 0 (for true and false) or to throw an exception on
510           access (for error). See the Types::Serialiser manpage for details.
511
512       tagged values
513           Tagged items consists of a numeric tag and another CBOR value.
514
515           See "TAG HANDLING AND EXTENSIONS" and the description of "->filter"
516           for details on which tags are handled how.
517
518       anything else
519           Anything else (e.g. unsupported simple values) will raise a
520           decoding error.
521
522   PERL -> CBOR
523       The mapping from Perl to CBOR is slightly more difficult, as Perl is a
524       typeless language. That means this module can only guess which CBOR
525       type is meant by a perl value.
526
527       hash references
528           Perl hash references become CBOR maps. As there is no inherent
529           ordering in hash keys (or CBOR maps), they will usually be encoded
530           in a pseudo-random order. This order can be different each time a
531           hash is encoded.
532
533           Currently, tied hashes will use the indefinite-length format, while
534           normal hashes will use the fixed-length format.
535
536       array references
537           Perl array references become fixed-length CBOR arrays.
538
539       other references
540           Other unblessed references will be represented using the
541           indirection tag extension (tag value 22098,
542           <http://cbor.schmorp.de/indirection>). CBOR decoders are guaranteed
543           to be able to decode these values somehow, by either "doing the
544           right thing", decoding into a generic tagged object, simply
545           ignoring the tag, or something else.
546
547       CBOR::XS::Tagged objects
548           Objects of this type must be arrays consisting of a single "[tag,
549           value]" pair. The (numerical) tag will be encoded as a CBOR tag,
550           the value will be encoded as appropriate for the value. You must
551           use "CBOR::XS::tag" to create such objects.
552
553       Types::Serialiser::true, Types::Serialiser::false,
554       Types::Serialiser::error
555           These special values become CBOR true, CBOR false and CBOR
556           undefined values, respectively. You can also use "\1", "\0" and
557           "\undef" directly if you want.
558
559       other blessed objects
560           Other blessed objects are serialised via "TO_CBOR" or "FREEZE". See
561           "TAG HANDLING AND EXTENSIONS" for specific classes handled by this
562           module, and "OBJECT SERIALISATION" for generic object
563           serialisation.
564
565       simple scalars
566           Simple Perl scalars (any scalar that is not a reference) are the
567           most difficult objects to encode: CBOR::XS will encode undefined
568           scalars as CBOR null values, scalars that have last been used in a
569           string context before encoding as CBOR strings, and anything else
570           as number value:
571
572              # dump as number
573              encode_cbor [2]                      # yields [2]
574              encode_cbor [-3.0e17]                # yields [-3e+17]
575              my $value = 5; encode_cbor [$value]  # yields [5]
576
577              # used as string, so dump as string (either byte or text)
578              print $value;
579              encode_cbor [$value]                 # yields ["5"]
580
581              # undef becomes null
582              encode_cbor [undef]                  # yields [null]
583
584           You can force the type to be a CBOR string by stringifying it:
585
586              my $x = 3.1; # some variable containing a number
587              "$x";        # stringified
588              $x .= "";    # another, more awkward way to stringify
589              print $x;    # perl does it for you, too, quite often
590
591           You can force whether a string is encoded as byte or text string by
592           using "utf8::upgrade" and "utf8::downgrade" (if "text_strings" is
593           disabled):
594
595             utf8::upgrade $x;   # encode $x as text string
596             utf8::downgrade $x; # encode $x as byte string
597
598           Perl doesn't define what operations up- and downgrade strings, so
599           if the difference between byte and text is important, you should
600           up- or downgrade your string as late as possible before encoding.
601           You can also force the use of CBOR text strings by using
602           "text_keys" or "text_strings".
603
604           You can force the type to be a CBOR number by numifying it:
605
606              my $x = "3"; # some variable containing a string
607              $x += 0;     # numify it, ensuring it will be dumped as a number
608              $x *= 1;     # same thing, the choice is yours.
609
610           You can not currently force the type in other, less obscure, ways.
611           Tell me if you need this capability (but don't forget to explain
612           why it's needed :).
613
614           Perl values that seem to be integers generally use the shortest
615           possible representation. Floating-point values will use either the
616           IEEE single format if possible without loss of precision, otherwise
617           the IEEE double format will be used. Perls that use formats other
618           than IEEE double to represent numerical values are supported, but
619           might suffer loss of precision.
620
621   OBJECT SERIALISATION
622       This module implements both a CBOR-specific and the generic
623       Types::Serialier object serialisation protocol. The following
624       subsections explain both methods.
625
626       ENCODING
627
628       This module knows two way to serialise a Perl object: The CBOR-specific
629       way, and the generic way.
630
631       Whenever the encoder encounters a Perl object that it cannot serialise
632       directly (most of them), it will first look up the "TO_CBOR" method on
633       it.
634
635       If it has a "TO_CBOR" method, it will call it with the object as only
636       argument, and expects exactly one return value, which it will then
637       substitute and encode it in the place of the object.
638
639       Otherwise, it will look up the "FREEZE" method. If it exists, it will
640       call it with the object as first argument, and the constant string
641       "CBOR" as the second argument, to distinguish it from other
642       serialisers.
643
644       The "FREEZE" method can return any number of values (i.e. zero or
645       more). These will be encoded as CBOR perl object, together with the
646       classname.
647
648       These methods MUST NOT change the data structure that is being
649       serialised. Failure to comply to this can result in memory corruption -
650       and worse.
651
652       If an object supports neither "TO_CBOR" nor "FREEZE", encoding will
653       fail with an error.
654
655       DECODING
656
657       Objects encoded via "TO_CBOR" cannot (normally) be automatically
658       decoded, but objects encoded via "FREEZE" can be decoded using the
659       following protocol:
660
661       When an encoded CBOR perl object is encountered by the decoder, it will
662       look up the "THAW" method, by using the stored classname, and will fail
663       if the method cannot be found.
664
665       After the lookup it will call the "THAW" method with the stored
666       classname as first argument, the constant string "CBOR" as second
667       argument, and all values returned by "FREEZE" as remaining arguments.
668
669       EXAMPLES
670
671       Here is an example "TO_CBOR" method:
672
673          sub My::Object::TO_CBOR {
674             my ($obj) = @_;
675
676             ["this is a serialised My::Object object", $obj->{id}]
677          }
678
679       When a "My::Object" is encoded to CBOR, it will instead encode a simple
680       array with two members: a string, and the "object id". Decoding this
681       CBOR string will yield a normal perl array reference in place of the
682       object.
683
684       A more useful and practical example would be a serialisation method for
685       the URI module. CBOR has a custom tag value for URIs, namely 32:
686
687         sub URI::TO_CBOR {
688            my ($self) = @_;
689            my $uri = "$self"; # stringify uri
690            utf8::upgrade $uri; # make sure it will be encoded as UTF-8 string
691            CBOR::XS::tag 32, "$_[0]"
692         }
693
694       This will encode URIs as a UTF-8 string with tag 32, which indicates an
695       URI.
696
697       Decoding such an URI will not (currently) give you an URI object, but
698       instead a CBOR::XS::Tagged object with tag number 32 and the string -
699       exactly what was returned by "TO_CBOR".
700
701       To serialise an object so it can automatically be deserialised, you
702       need to use "FREEZE" and "THAW". To take the URI module as example,
703       this would be a possible implementation:
704
705          sub URI::FREEZE {
706             my ($self, $serialiser) = @_;
707             "$self" # encode url string
708          }
709
710          sub URI::THAW {
711             my ($class, $serialiser, $uri) = @_;
712             $class->new ($uri)
713          }
714
715       Unlike "TO_CBOR", multiple values can be returned by "FREEZE". For
716       example, a "FREEZE" method that returns "type", "id" and "variant"
717       values would cause an invocation of "THAW" with 5 arguments:
718
719          sub My::Object::FREEZE {
720             my ($self, $serialiser) = @_;
721
722             ($self->{type}, $self->{id}, $self->{variant})
723          }
724
725          sub My::Object::THAW {
726             my ($class, $serialiser, $type, $id, $variant) = @_;
727
728             $class-<new (type => $type, id => $id, variant => $variant)
729          }
730

MAGIC HEADER

732       There is no way to distinguish CBOR from other formats
733       programmatically. To make it easier to distinguish CBOR from other
734       formats, the CBOR specification has a special "magic string" that can
735       be prepended to any CBOR string without changing its meaning.
736
737       This string is available as $CBOR::XS::MAGIC. This module does not
738       prepend this string to the CBOR data it generates, but it will ignore
739       it if present, so users can prepend this string as a "file type"
740       indicator as required.
741

THE CBOR::XS::Tagged CLASS

743       CBOR has the concept of tagged values - any CBOR value can be tagged
744       with a numeric 64 bit number, which are centrally administered.
745
746       "CBOR::XS" handles a few tags internally when en- or decoding. You can
747       also create tags yourself by encoding "CBOR::XS::Tagged" objects, and
748       the decoder will create "CBOR::XS::Tagged" objects itself when it hits
749       an unknown tag.
750
751       These objects are simply blessed array references - the first member of
752       the array being the numerical tag, the second being the value.
753
754       You can interact with "CBOR::XS::Tagged" objects in the following ways:
755
756       $tagged = CBOR::XS::tag $tag, $value
757           This function(!) creates a new "CBOR::XS::Tagged" object using the
758           given $tag (0..2**64-1) to tag the given $value (which can be any
759           Perl value that can be encoded in CBOR, including serialisable Perl
760           objects and "CBOR::XS::Tagged" objects).
761
762       $tagged->[0]
763       $tagged->[0] = $new_tag
764       $tag = $tagged->tag
765       $new_tag = $tagged->tag ($new_tag)
766           Access/mutate the tag.
767
768       $tagged->[1]
769       $tagged->[1] = $new_value
770       $value = $tagged->value
771       $new_value = $tagged->value ($new_value)
772           Access/mutate the tagged value.
773
774   EXAMPLES
775       Here are some examples of "CBOR::XS::Tagged" uses to tag objects.
776
777       You can look up CBOR tag value and emanings in the IANA registry at
778       <http://www.iana.org/assignments/cbor-tags/cbor-tags.xhtml>.
779
780       Prepend a magic header ($CBOR::XS::MAGIC):
781
782          my $cbor = encode_cbor CBOR::XS::tag 55799, $value;
783          # same as:
784          my $cbor = $CBOR::XS::MAGIC . encode_cbor $value;
785
786       Serialise some URIs and a regex in an array:
787
788          my $cbor = encode_cbor [
789             (CBOR::XS::tag 32, "http://www.nethype.de/"),
790             (CBOR::XS::tag 32, "http://software.schmorp.de/"),
791             (CBOR::XS::tag 35, "^[Pp][Ee][Rr][lL]\$"),
792          ];
793
794       Wrap CBOR data in CBOR:
795
796          my $cbor_cbor = encode_cbor
797             CBOR::XS::tag 24,
798                encode_cbor [1, 2, 3];
799

TAG HANDLING AND EXTENSIONS

801       This section describes how this module handles specific tagged values
802       and extensions. If a tag is not mentioned here and no additional
803       filters are provided for it, then the default handling applies
804       (creating a CBOR::XS::Tagged object on decoding, and only encoding the
805       tag when explicitly requested).
806
807       Tags not handled specifically are currently converted into a
808       CBOR::XS::Tagged object, which is simply a blessed array reference
809       consisting of the numeric tag value followed by the (decoded) CBOR
810       value.
811
812       Future versions of this module reserve the right to special case
813       additional tags (such as base64url).
814
815   ENFORCED TAGS
816       These tags are always handled when decoding, and their handling cannot
817       be overridden by the user.
818
819       26 (perl-object, <http://cbor.schmorp.de/perl-object>)
820           These tags are automatically created (and decoded) for serialisable
821           objects using the "FREEZE/THAW" methods (the Types::Serialier
822           object serialisation protocol). See "OBJECT SERIALISATION" for
823           details.
824
825       28, 29 (shareable, sharedref, <http://cbor.schmorp.de/value-sharing>)
826           These tags are automatically decoded when encountered (and they do
827           not result in a cyclic data structure, see "allow_cycles"),
828           resulting in shared values in the decoded object. They are only
829           encoded, however, when "allow_sharing" is enabled.
830
831           Not all shared values can be successfully decoded: values that
832           reference themselves will currently decode as "undef" (this is not
833           the same as a reference pointing to itself, which will be
834           represented as a value that contains an indirect reference to
835           itself - these will be decoded properly).
836
837           Note that considerably more shared value data structures can be
838           decoded than will be encoded - currently, only values pointed to by
839           references will be shared, others will not. While non-reference
840           shared values can be generated in Perl with some effort, they were
841           considered too unimportant to be supported in the encoder. The
842           decoder, however, will decode these values as shared values.
843
844       256, 25 (stringref-namespace, stringref,
845       <http://cbor.schmorp.de/stringref>)
846           These tags are automatically decoded when encountered. They are
847           only encoded, however, when "pack_strings" is enabled.
848
849       22098 (indirection, <http://cbor.schmorp.de/indirection>)
850           This tag is automatically generated when a reference are
851           encountered (with the exception of hash and array references). It
852           is converted to a reference when decoding.
853
854       55799 (self-describe CBOR, RFC 7049)
855           This value is not generated on encoding (unless explicitly
856           requested by the user), and is simply ignored when decoding.
857
858   NON-ENFORCED TAGS
859       These tags have default filters provided when decoding. Their handling
860       can be overridden by changing the %CBOR::XS::FILTER entry for the tag,
861       or by providing a custom "filter" callback when decoding.
862
863       When they result in decoding into a specific Perl class, the module
864       usually provides a corresponding "TO_CBOR" method as well.
865
866       When any of these need to load additional modules that are not part of
867       the perl core distribution (e.g. URI), it is (currently) up to the user
868       to provide these modules. The decoding usually fails with an exception
869       if the required module cannot be loaded.
870
871       0, 1 (date/time string, seconds since the epoch)
872           These tags are decoded into Time::Piece objects. The corresponding
873           "Time::Piece::TO_CBOR" method always encodes into tag 1 values
874           currently.
875
876           The Time::Piece API is generally surprisingly bad, and fractional
877           seconds are only accidentally kept intact, so watch out. On the
878           plus side, the module comes with perl since 5.10, which has to
879           count for something.
880
881       2, 3 (positive/negative bignum)
882           These tags are decoded into Math::BigInt objects. The corresponding
883           "Math::BigInt::TO_CBOR" method encodes "small" bigints into normal
884           CBOR integers, and others into positive/negative CBOR bignums.
885
886       4, 5, 264, 265 (decimal fraction/bigfloat)
887           Both decimal fractions and bigfloats are decoded into
888           Math::BigFloat objects. The corresponding "Math::BigFloat::TO_CBOR"
889           method always encodes into a decimal fraction (either tag 4 or
890           264).
891
892           NaN and infinities are not encoded properly, as they cannot be
893           represented in CBOR.
894
895           See "BIGNUM SECURITY CONSIDERATIONS" for more info.
896
897       30 (rational numbers)
898           These tags are decoded into Math::BigRat objects. The corresponding
899           "Math::BigRat::TO_CBOR" method encodes rational numbers with
900           denominator 1 via their numerator only, i.e., they become normal
901           integers or "bignums".
902
903           See "BIGNUM SECURITY CONSIDERATIONS" for more info.
904
905       21, 22, 23 (expected later JSON conversion)
906           CBOR::XS is not a CBOR-to-JSON converter, and will simply ignore
907           these tags.
908
909       32 (URI)
910           These objects decode into URI objects. The corresponding
911           "URI::TO_CBOR" method again results in a CBOR URI value.
912

CBOR and JSON

914       CBOR is supposed to implement a superset of the JSON data model, and
915       is, with some coercion, able to represent all JSON texts (something
916       that other "binary JSON" formats such as BSON generally do not
917       support).
918
919       CBOR implements some extra hints and support for JSON interoperability,
920       and the spec offers further guidance for conversion between CBOR and
921       JSON. None of this is currently implemented in CBOR, and the guidelines
922       in the spec do not result in correct round-tripping of data. If JSON
923       interoperability is improved in the future, then the goal will be to
924       ensure that decoded JSON data will round-trip encoding and decoding to
925       CBOR intact.
926

SECURITY CONSIDERATIONS

928       Tl;dr... if you want to decode or encode CBOR from untrusted sources,
929       you should start with a coder object created via "new_safe":
930
931          my $coder = CBOR::XS->new_safe;
932
933          my $data = $coder->decode ($cbor_text);
934          my $cbor = $coder->encode ($data);
935
936       Longer version: When you are using CBOR in a protocol, talking to
937       untrusted potentially hostile creatures requires some thought:
938
939       Security of the CBOR decoder itself
940           First and foremost, your CBOR decoder should be secure, that is,
941           should not have any buffer overflows or similar bugs that could
942           potentially be exploited. Obviously, this module should ensure that
943           and I am trying hard on making that true, but you never know.
944
945       CBOR::XS can invoke almost arbitrary callbacks during decoding
946           CBOR::XS supports object serialisation - decoding CBOR can cause
947           calls to any "THAW" method in any package that exists in your
948           process (that is, CBOR::XS will not try to load modules, but any
949           existing "THAW" method or function can be called, so they all have
950           to be secure).
951
952           Less obviously, it will also invoke "TO_CBOR" and "FREEZE" methods
953           - even if all your "THAW" methods are secure, encoding data
954           structures from untrusted sources can invoke those and trigger bugs
955           in those.
956
957           So, if you are not sure about the security of all the modules you
958           have loaded (you shouldn't), you should disable this part using
959           "forbid_objects".
960
961       CBOR can be extended with tags that call library code
962           CBOR can be extended with tags, and "CBOR::XS" has a registry of
963           conversion functions for many existing tags that can be extended
964           via third-party modules (see the "filter" method).
965
966           If you don't trust these, you should configure the "safe" filter
967           function, "CBOR::XS::safe_filter", which by default only includes
968           conversion functions that are considered "safe" by the author (but
969           again, they can be extended by third party modules).
970
971           Depending on your level of paranoia, you can use the "safe" filter:
972
973              $cbor->filter (\&CBOR::XS::safe_filter);
974
975           ... your own filter...
976
977              $cbor->filter (sub { ... do your stuffs here ... });
978
979           ... or even no filter at all, disabling all tag decoding:
980
981              $cbor->filter (sub { });
982
983           This is never a problem for encoding, as the tag mechanism only
984           exists in CBOR texts.
985
986       Resource-starving attacks: object memory usage
987           You need to avoid resource-starving attacks. That means you should
988           limit the size of CBOR data you accept, or make sure then when your
989           resources run out, that's just fine (e.g. by using a separate
990           process that can crash safely). The size of a CBOR string in octets
991           is usually a good indication of the size of the resources required
992           to decode it into a Perl structure. While CBOR::XS can check the
993           size of the CBOR text (using "max_size"), it might be too late when
994           you already have it in memory, so you might want to check the size
995           before you accept the string.
996
997           As for encoding, it is possible to construct data structures that
998           are relatively small but result in large CBOR texts (for example by
999           having an array full of references to the same big data structure,
1000           which will all be deep-cloned during encoding by default). This is
1001           rarely an actual issue (and the worst case is still just running
1002           out of memory), but you can reduce this risk by using
1003           "allow_sharing".
1004
1005       Resource-starving attacks: stack overflows
1006           CBOR::XS recurses using the C stack when decoding objects and
1007           arrays. The C stack is a limited resource: for instance, on my
1008           amd64 machine with 8MB of stack size I can decode around 180k
1009           nested arrays but only 14k nested CBOR objects (due to perl itself
1010           recursing deeply on croak to free the temporary). If that is
1011           exceeded, the program crashes. To be conservative, the default
1012           nesting limit is set to 512. If your process has a smaller stack,
1013           you should adjust this setting accordingly with the "max_depth"
1014           method.
1015
1016       Resource-starving attacks: CPU en-/decoding complexity
1017           CBOR::XS will use the Math::BigInt, Math::BigFloat and Math::BigRat
1018           libraries to represent encode/decode bignums. These can be very
1019           slow (as in, centuries of CPU time) and can even crash your program
1020           (and are generally not very trustworthy). See the next section for
1021           details.
1022
1023       Data breaches: leaking information in error messages
1024           CBOR::XS might leak contents of your Perl data structures in its
1025           error messages, so when you serialise sensitive information you
1026           might want to make sure that exceptions thrown by CBOR::XS will not
1027           end up in front of untrusted eyes.
1028
1029       Something else...
1030           Something else could bomb you, too, that I forgot to think of. In
1031           that case, you get to keep the pieces. I am always open for hints,
1032           though...
1033

BIGNUM SECURITY CONSIDERATIONS

1035       CBOR::XS provides a "TO_CBOR" method for both Math::BigInt and
1036       Math::BigFloat that tries to encode the number in the simplest possible
1037       way, that is, either a CBOR integer, a CBOR bigint/decimal fraction
1038       (tag 4) or an arbitrary-exponent decimal fraction (tag 264). Rational
1039       numbers (Math::BigRat, tag 30) can also contain bignums as members.
1040
1041       CBOR::XS will also understand base-2 bigfloat or arbitrary-exponent
1042       bigfloats (tags 5 and 265), but it will never generate these on its
1043       own.
1044
1045       Using the built-in Math::BigInt::Calc support, encoding and decoding
1046       decimal fractions is generally fast. Decoding bigints can be slow for
1047       very big numbers (tens of thousands of digits, something that could
1048       potentially be caught by limiting the size of CBOR texts), and decoding
1049       bigfloats or arbitrary-exponent bigfloats can be extremely slow
1050       (minutes, decades) for large exponents (roughly 40 bit and longer).
1051
1052       Additionally, Math::BigInt can take advantage of other bignum
1053       libraries, such as Math::GMP, which cannot handle big floats with large
1054       exponents, and might simply abort or crash your program, due to their
1055       code quality.
1056
1057       This can be a concern if you want to parse untrusted CBOR. If it is,
1058       you might want to disable decoding of tag 2 (bigint) and 3 (negative
1059       bigint) types. You should also disable types 5 and 265, as these can be
1060       slow even without bigints.
1061
1062       Disabling bigints will also partially or fully disable types that rely
1063       on them, e.g. rational numbers that use bignums.
1064

CBOR IMPLEMENTATION NOTES

1066       This section contains some random implementation notes. They do not
1067       describe guaranteed behaviour, but merely behaviour as-is implemented
1068       right now.
1069
1070       64 bit integers are only properly decoded when Perl was built with 64
1071       bit support.
1072
1073       Strings and arrays are encoded with a definite length. Hashes as well,
1074       unless they are tied (or otherwise magical).
1075
1076       Only the double data type is supported for NV data types - when Perl
1077       uses long double to represent floating point values, they might not be
1078       encoded properly. Half precision types are accepted, but not encoded.
1079
1080       Strict mode and canonical mode are not implemented.
1081

LIMITATIONS ON PERLS WITHOUT 64-BIT INTEGER SUPPORT

1083       On perls that were built without 64 bit integer support (these are rare
1084       nowadays, even on 32 bit architectures, as all major Perl distributions
1085       are built with 64 bit integer support), support for any kind of 64 bit
1086       integer in CBOR is very limited - most likely, these 64 bit values will
1087       be truncated, corrupted, or otherwise not decoded correctly. This also
1088       includes string, array and map sizes that are stored as 64 bit
1089       integers.
1090

THREADS

1092       This module is not guaranteed to be thread safe and there are no plans
1093       to change this until Perl gets thread support (as opposed to the
1094       horribly slow so-called "threads" which are simply slow and bloated
1095       process simulations - use fork, it's much faster, cheaper, better).
1096
1097       (It might actually work, but you have been warned).
1098

BUGS

1100       While the goal of this module is to be correct, that unfortunately does
1101       not mean it's bug-free, only that I think its design is bug-free. If
1102       you keep reporting bugs they will be fixed swiftly, though.
1103
1104       Please refrain from using rt.cpan.org or any other bug reporting
1105       service. I put the contact address into my modules for a reason.
1106

SEE ALSO

1108       The JSON and JSON::XS modules that do similar, but human-readable,
1109       serialisation.
1110
1111       The Types::Serialiser module provides the data model for true, false
1112       and error values.
1113

AUTHOR

1115        Marc Lehmann <schmorp@schmorp.de>
1116        http://home.schmorp.de/
1117
1118
1119
1120perl v5.30.1                      2020-01-29                             XS(3)
Impressum