CBOR::XS(3pm)

1XS(3)                 User Contributed Perl Documentation                XS(3)
2
3
4

NAME

6       CBOR::XS - Concise Binary Object Representation (CBOR, RFC7049)
7

SYNOPSIS

9        use CBOR::XS;
10
11        $binary_cbor_data = encode_cbor $perl_value;
12        $perl_value       = decode_cbor $binary_cbor_data;
13
14        # OO-interface
15
16        $coder = CBOR::XS->new;
17        $binary_cbor_data = $coder->encode ($perl_value);
18        $perl_value       = $coder->decode ($binary_cbor_data);
19
20        # prefix decoding
21
22        my $many_cbor_strings = ...;
23        while (length $many_cbor_strings) {
24           my ($data, $length) = $cbor->decode_prefix ($many_cbor_strings);
25           # data was decoded
26           substr $many_cbor_strings, 0, $length, ""; # remove decoded cbor string
27        }
28

DESCRIPTION

30       This module converts Perl data structures to the Concise Binary Object
31       Representation (CBOR) and vice versa. CBOR is a fast binary
32       serialisation format that aims to use an (almost) superset of the JSON
33       data model, i.e.  when you can represent something useful in JSON, you
34       should be able to represent it in CBOR.
35
36       In short, CBOR is a faster and quite compact binary alternative to
37       JSON, with the added ability of supporting serialisation of Perl
38       objects. (JSON often compresses better than CBOR though, so if you plan
39       to compress the data later and speed is less important you might want
40       to compare both formats first).
41
42       The primary goal of this module is to be correct and the secondary goal
43       is to be fast. To reach the latter goal it was written in C.
44
45       To give you a general idea about speed, with texts in the megabyte
46       range, "CBOR::XS" usually encodes roughly twice as fast as Storable or
47       JSON::XS and decodes about 15%-30% faster than those. The shorter the
48       data, the worse Storable performs in comparison.
49
50       Regarding compactness, "CBOR::XS"-encoded data structures are usually
51       about 20% smaller than the same data encoded as (compact) JSON or
52       Storable.
53
54       In addition to the core CBOR data format, this module implements a
55       number of extensions, to support cyclic and shared data structures (see
56       "allow_sharing" and "allow_cycles"), string deduplication (see
57       "pack_strings") and scalar references (always enabled).
58
59       See MAPPING, below, on how CBOR::XS maps perl values to CBOR values and
60       vice versa.
61

FUNCTIONAL INTERFACE

63       The following convenience methods are provided by this module. They are
64       exported by default:
65
66       $cbor_data = encode_cbor $perl_scalar
67           Converts the given Perl data structure to CBOR representation.
68           Croaks on error.
69
70       $perl_scalar = decode_cbor $cbor_data
71           The opposite of "encode_cbor": expects a valid CBOR string to
72           parse, returning the resulting perl scalar. Croaks on error.
73

OBJECT-ORIENTED INTERFACE

75       The object oriented interface lets you configure your own encoding or
76       decoding style, within the limits of supported formats.
77
78       $cbor = new CBOR::XS
79           Creates a new CBOR::XS object that can be used to de/encode CBOR
80           strings. All boolean flags described below are by default disabled.
81
82           The mutators for flags all return the CBOR object again and thus
83           calls can be chained:
84
85              my $cbor = CBOR::XS->new->encode ({a => [1,2]});
86
87       $cbor = new_safe CBOR::XS
88           Create a new, safe/secure CBOR::XS object. This is similar to
89           "new", but configures the coder object to be safe to use with
90           untrusted data. Currently, this is equivalent to:
91
92              my $cbor = CBOR::XS
93                 ->new
94                 ->validate_utf8
95                 ->forbid_objects
96                 ->filter (\&CBOR::XS::safe_filter)
97                 ->max_size (1e8);
98
99           But is more future proof (it is better to crash because of a change
100           than to be exploited in other ways).
101
102       $cbor = $cbor->max_depth ([$maximum_nesting_depth])
103       $max_depth = $cbor->get_max_depth
104           Sets the maximum nesting level (default 512) accepted while
105           encoding or decoding. If a higher nesting level is detected in CBOR
106           data or a Perl data structure, then the encoder and decoder will
107           stop and croak at that point.
108
109           Nesting level is defined by number of hash- or arrayrefs that the
110           encoder needs to traverse to reach a given point or the number of
111           "{" or "[" characters without their matching closing parenthesis
112           crossed to reach a given character in a string.
113
114           Setting the maximum depth to one disallows any nesting, so that
115           ensures that the object is only a single hash/object or array.
116
117           If no argument is given, the highest possible setting will be used,
118           which is rarely useful.
119
120           Note that nesting is implemented by recursion in C. The default
121           value has been chosen to be as large as typical operating systems
122           allow without crashing.
123
124           See "SECURITY CONSIDERATIONS", below, for more info on why this is
125           useful.
126
127       $cbor = $cbor->max_size ([$maximum_string_size])
128       $max_size = $cbor->get_max_size
129           Set the maximum length a CBOR string may have (in bytes) where
130           decoding is being attempted. The default is 0, meaning no limit.
131           When "decode" is called on a string that is longer then this many
132           bytes, it will not attempt to decode the string but throw an
133           exception. This setting has no effect on "encode" (yet).
134
135           If no argument is given, the limit check will be deactivated (same
136           as when 0 is specified).
137
138           See "SECURITY CONSIDERATIONS", below, for more info on why this is
139           useful.
140
141       $cbor = $cbor->allow_unknown ([$enable])
142       $enabled = $cbor->get_allow_unknown
143           If $enable is true (or missing), then "encode" will not throw an
144           exception when it encounters values it cannot represent in CBOR
145           (for example, filehandles) but instead will encode a CBOR "error"
146           value.
147
148           If $enable is false (the default), then "encode" will throw an
149           exception when it encounters anything it cannot encode as CBOR.
150
151           This option does not affect "decode" in any way, and it is
152           recommended to leave it off unless you know your communications
153           partner.
154
155       $cbor = $cbor->allow_sharing ([$enable])
156       $enabled = $cbor->get_allow_sharing
157           If $enable is true (or missing), then "encode" will not double-
158           encode values that have been referenced before (e.g. when the same
159           object, such as an array, is referenced multiple times), but
160           instead will emit a reference to the earlier value.
161
162           This means that such values will only be encoded once, and will not
163           result in a deep cloning of the value on decode, in decoders
164           supporting the value sharing extension. This also makes it possible
165           to encode cyclic data structures (which need "allow_cycles" to be
166           enabled to be decoded by this module).
167
168           It is recommended to leave it off unless you know your
169           communication partner supports the value sharing extensions to CBOR
170           (<http://cbor.schmorp.de/value-sharing>), as without decoder
171           support, the resulting data structure might be unusable.
172
173           Detecting shared values incurs a runtime overhead when values are
174           encoded that have a reference counter large than one, and might
175           unnecessarily increase the encoded size, as potentially shared
176           values are encoded as shareable whether or not they are actually
177           shared.
178
179           At the moment, only targets of references can be shared (e.g.
180           scalars, arrays or hashes pointed to by a reference). Weirder
181           constructs, such as an array with multiple "copies" of the same
182           string, which are hard but not impossible to create in Perl, are
183           not supported (this is the same as with Storable).
184
185           If $enable is false (the default), then "encode" will encode shared
186           data structures repeatedly, unsharing them in the process. Cyclic
187           data structures cannot be encoded in this mode.
188
189           This option does not affect "decode" in any way - shared values and
190           references will always be decoded properly if present.
191
192       $cbor = $cbor->allow_cycles ([$enable])
193       $enabled = $cbor->get_allow_cycles
194           If $enable is true (or missing), then "decode" will happily decode
195           self-referential (cyclic) data structures. By default these will
196           not be decoded, as they need manual cleanup to avoid memory leaks,
197           so code that isn't prepared for this will not leak memory.
198
199           If $enable is false (the default), then "decode" will throw an
200           error when it encounters a self-referential/cyclic data structure.
201
202           FUTURE DIRECTION: the motivation behind this option is to avoid
203           real cycles - future versions of this module might chose to decode
204           cyclic data structures using weak references when this option is
205           off, instead of throwing an error.
206
207           This option does not affect "encode" in any way - shared values and
208           references will always be encoded properly if present.
209
210       $cbor = $cbor->forbid_objects ([$enable])
211       $enabled = $cbor->get_forbid_objects
212           Disables the use of the object serialiser protocol.
213
214           If $enable is true (or missing), then "encode" will will throw an
215           exception when it encounters perl objects that would be encoded
216           using the perl-object tag (26). When "decode" encounters such tags,
217           it will fall back to the general filter/tagged logic as if this
218           were an unknown tag (by default resulting in a "CBOR::XC::Tagged"
219           object).
220
221           If $enable is false (the default), then "encode" will use the
222           Types::Serialiser object serialisation protocol to serialise
223           objects into perl-object tags, and "decode" will do the same to
224           decode such tags.
225
226           See "SECURITY CONSIDERATIONS", below, for more info on why
227           forbidding this protocol can be useful.
228
229       $cbor = $cbor->pack_strings ([$enable])
230       $enabled = $cbor->get_pack_strings
231           If $enable is true (or missing), then "encode" will try not to
232           encode the same string twice, but will instead encode a reference
233           to the string instead. Depending on your data format, this can save
234           a lot of space, but also results in a very large runtime overhead
235           (expect encoding times to be 2-4 times as high as without).
236
237           It is recommended to leave it off unless you know your
238           communications partner supports the stringref extension to CBOR
239           (<http://cbor.schmorp.de/stringref>), as without decoder support,
240           the resulting data structure might not be usable.
241
242           If $enable is false (the default), then "encode" will encode
243           strings the standard CBOR way.
244
245           This option does not affect "decode" in any way - string references
246           will always be decoded properly if present.
247
248       $cbor = $cbor->text_keys ([$enable])
249       $enabled = $cbor->get_text_keys
250           If $enabled is true (or missing), then "encode" will encode all
251           perl hash keys as CBOR text strings/UTF-8 string, upgrading them as
252           needed.
253
254           If $enable is false (the default), then "encode" will encode hash
255           keys normally - upgraded perl strings (strings internally encoded
256           as UTF-8) as CBOR text strings, and downgraded perl strings as CBOR
257           byte strings.
258
259           This option does not affect "decode" in any way.
260
261           This option is useful for interoperability with CBOR decoders that
262           don't treat byte strings as a form of text. It is especially useful
263           as Perl gives very little control over hash keys.
264
265           Enabling this option can be slow, as all downgraded hash keys that
266           are encoded need to be scanned and converted to UTF-8.
267
268       $cbor = $cbor->text_strings ([$enable])
269       $enabled = $cbor->get_text_strings
270           This option works similar to "text_keys", above, but works on all
271           strings (including hash keys), so "text_keys" has no further effect
272           after enabling "text_strings".
273
274           If $enabled is true (or missing), then "encode" will encode all
275           perl strings as CBOR text strings/UTF-8 strings, upgrading them as
276           needed.
277
278           If $enable is false (the default), then "encode" will encode
279           strings normally (but see "text_keys") - upgraded perl strings
280           (strings internally encoded as UTF-8) as CBOR text strings, and
281           downgraded perl strings as CBOR byte strings.
282
283           This option does not affect "decode" in any way.
284
285           This option has similar advantages and disadvantages as
286           "text_keys". In addition, this option effectively removes the
287           ability to automatically encode byte strings, which might break
288           some "FREEZE" and "TO_CBOR" methods that rely on this.
289
290           A workaround is to use explicit type casts, which are unaffected by
291           this option.
292
293       $cbor = $cbor->validate_utf8 ([$enable])
294       $enabled = $cbor->get_validate_utf8
295           If $enable is true (or missing), then "decode" will validate that
296           elements (text strings) containing UTF-8 data in fact contain valid
297           UTF-8 data (instead of blindly accepting it). This validation
298           obviously takes extra time during decoding.
299
300           The concept of "valid UTF-8" used is perl's concept, which is a
301           superset of the official UTF-8.
302
303           If $enable is false (the default), then "decode" will blindly
304           accept UTF-8 data, marking them as valid UTF-8 in the resulting
305           data structure regardless of whether that's true or not.
306
307           Perl isn't too happy about corrupted UTF-8 in strings, but should
308           generally not crash or do similarly evil things. Extensions might
309           be not so forgiving, so it's recommended to turn on this setting if
310           you receive untrusted CBOR.
311
312           This option does not affect "encode" in any way - strings that are
313           supposedly valid UTF-8 will simply be dumped into the resulting
314           CBOR string without checking whether that is, in fact, true or not.
315
316       $cbor = $cbor->filter ([$cb->($tag, $value)])
317       $cb_or_undef = $cbor->get_filter
318           Sets or replaces the tagged value decoding filter (when $cb is
319           specified) or clears the filter (if no argument or "undef" is
320           provided).
321
322           The filter callback is called only during decoding, when a non-
323           enforced tagged value has been decoded (see "TAG HANDLING AND
324           EXTENSIONS" for a list of enforced tags). For specific tags, it's
325           often better to provide a default converter using the
326           %CBOR::XS::FILTER hash (see below).
327
328           The first argument is the numerical tag, the second is the
329           (decoded) value that has been tagged.
330
331           The filter function should return either exactly one value, which
332           will replace the tagged value in the decoded data structure, or no
333           values, which will result in default handling, which currently
334           means the decoder creates a "CBOR::XS::Tagged" object to hold the
335           tag and the value.
336
337           When the filter is cleared (the default state), the default filter
338           function, "CBOR::XS::default_filter", is used. This function simply
339           looks up the tag in the %CBOR::XS::FILTER hash. If an entry exists
340           it must be a code reference that is called with tag and value, and
341           is responsible for decoding the value. If no entry exists, it
342           returns no values. "CBOR::XS" provides a number of default filter
343           functions already, the the %CBOR::XS::FILTER hash can be freely
344           extended with more.
345
346           "CBOR::XS" additionally provides an alternative filter function
347           that is supposed to be safe to use with untrusted data (which the
348           default filter might not), called "CBOR::XS::safe_filter", which
349           works the same as the "default_filter" but uses the
350           %CBOR::XS::SAFE_FILTER variable instead. It is prepopulated with
351           the tag decoding functions that are deemed safe (basically the same
352           as %CBOR::XS::FILTER without all the bignum tags), and can be
353           extended by user code as wlel, although, obviously, one should be
354           very careful about adding decoding functions here, since the
355           expectation is that they are safe to use on untrusted data, after
356           all.
357
358           Example: decode all tags not handled internally into
359           "CBOR::XS::Tagged" objects, with no other special handling (useful
360           when working with potentially "unsafe" CBOR data).
361
362              CBOR::XS->new->filter (sub { })->decode ($cbor_data);
363
364           Example: provide a global filter for tag 1347375694, converting the
365           value into some string form.
366
367              $CBOR::XS::FILTER{1347375694} = sub {
368                 my ($tag, $value);
369
370                 "tag 1347375694 value $value"
371              };
372
373           Example: provide your own filter function that looks up tags in
374           your own hash:
375
376              my %my_filter = (
377                 998347484 => sub {
378                    my ($tag, $value);
379
380                    "tag 998347484 value $value"
381                 };
382              );
383
384              my $coder = CBOR::XS->new->filter (sub {
385                 &{ $my_filter{$_[0]} or return }
386              });
387
388           Example: use the safe filter function (see "SECURITY
389           CONSIDERATIONS" for more considerations on security).
390
391              CBOR::XS->new->filter (\&CBOR::XS::safe_filter)->decode ($cbor_data);
392
393       $cbor_data = $cbor->encode ($perl_scalar)
394           Converts the given Perl data structure (a scalar value) to its CBOR
395           representation.
396
397       $perl_scalar = $cbor->decode ($cbor_data)
398           The opposite of "encode": expects CBOR data and tries to parse it,
399           returning the resulting simple scalar or reference. Croaks on
400           error.
401
402       ($perl_scalar, $octets) = $cbor->decode_prefix ($cbor_data)
403           This works like the "decode" method, but instead of raising an
404           exception when there is trailing garbage after the CBOR string, it
405           will silently stop parsing there and return the number of
406           characters consumed so far.
407
408           This is useful if your CBOR texts are not delimited by an outer
409           protocol and you need to know where the first CBOR string ends amd
410           the next one starts - CBOR strings are self-delimited, so it is
411           possible to concatenate CBOR strings without any delimiters or size
412           fields and recover their data.
413
414              CBOR::XS->new->decode_prefix ("......")
415              => ("...", 3)
416
417   INCREMENTAL PARSING
418       In some cases, there is the need for incremental parsing of JSON texts.
419       While this module always has to keep both CBOR text and resulting Perl
420       data structure in memory at one time, it does allow you to parse a CBOR
421       stream incrementally, using a similar to using "decode_prefix" to see
422       if a full CBOR object is available, but is much more efficient.
423
424       It basically works by parsing as much of a CBOR string as possible - if
425       the CBOR data is not complete yet, the pasrer will remember where it
426       was, to be able to restart when more data has been accumulated. Once
427       enough data is available to either decode a complete CBOR value or
428       raise an error, a real decode will be attempted.
429
430       A typical use case would be a network protocol that consists of sending
431       and receiving CBOR-encoded messages. The solution that works with CBOR
432       and about anything else is by prepending a length to every CBOR value,
433       so the receiver knows how many octets to read. More compact (and
434       slightly slower) would be to just send CBOR values back-to-back, as
435       "CBOR::XS" knows where a CBOR value ends, and doesn't need an explicit
436       length.
437
438       The following methods help with this:
439
440       @decoded = $cbor->incr_parse ($buffer)
441           This method attempts to decode exactly one CBOR value from the
442           beginning of the given $buffer. The value is removed from the
443           $buffer on success. When $buffer doesn't contain a complete value
444           yet, it returns nothing. Finally, when the $buffer doesn't start
445           with something that could ever be a valid CBOR value, it raises an
446           exception, just as "decode" would. In the latter case the decoder
447           state is undefined and must be reset before being able to parse
448           further.
449
450           This method modifies the $buffer in place. When no CBOR value can
451           be decoded, the decoder stores the current string offset. On the
452           next call, continues decoding at the place where it stopped before.
453           For this to make sense, the $buffer must begin with the same octets
454           as on previous unsuccessful calls.
455
456           You can call this method in scalar context, in which case it either
457           returns a decoded value or "undef". This makes it impossible to
458           distinguish between CBOR null values (which decode to "undef") and
459           an unsuccessful decode, which is often acceptable.
460
461       @decoded = $cbor->incr_parse_multiple ($buffer)
462           Same as "incr_parse", but attempts to decode as many CBOR values as
463           possible in one go, instead of at most one. Calls to "incr_parse"
464           and "incr_parse_multiple" can be interleaved.
465
466       $cbor->incr_reset
467           Resets the incremental decoder. This throws away any saved state,
468           so that subsequent calls to "incr_parse" or "incr_parse_multiple"
469           start to parse a new CBOR value from the beginning of the $buffer
470           again.
471
472           This method can be called at any time, but it must be called if you
473           want to change your $buffer or there was a decoding error and you
474           want to reuse the $cbor object for future incremental parsings.
475

MAPPING

477       This section describes how CBOR::XS maps Perl values to CBOR values and
478       vice versa. These mappings are designed to "do the right thing" in most
479       circumstances automatically, preserving round-tripping characteristics
480       (what you put in comes out as something equivalent).
481
482       For the more enlightened: note that in the following descriptions,
483       lowercase perl refers to the Perl interpreter, while uppercase Perl
484       refers to the abstract Perl language itself.
485
486   CBOR -> PERL
487       integers
488           CBOR integers become (numeric) perl scalars. On perls without 64
489           bit support, 64 bit integers will be truncated or otherwise
490           corrupted.
491
492       byte strings
493           Byte strings will become octet strings in Perl (the Byte values
494           0..255 will simply become characters of the same value in Perl).
495
496       UTF-8 strings
497           UTF-8 strings in CBOR will be decoded, i.e. the UTF-8 octets will
498           be decoded into proper Unicode code points. At the moment, the
499           validity of the UTF-8 octets will not be validated - corrupt input
500           will result in corrupted Perl strings.
501
502       arrays, maps
503           CBOR arrays and CBOR maps will be converted into references to a
504           Perl array or hash, respectively. The keys of the map will be
505           stringified during this process.
506
507       null
508           CBOR null becomes "undef" in Perl.
509
510       true, false, undefined
511           These CBOR values become "Types:Serialiser::true",
512           "Types:Serialiser::false" and "Types::Serialiser::error",
513           respectively. They are overloaded to act almost exactly like the
514           numbers 1 and 0 (for true and false) or to throw an exception on
515           access (for error). See the Types::Serialiser manpage for details.
516
517       tagged values
518           Tagged items consists of a numeric tag and another CBOR value.
519
520           See "TAG HANDLING AND EXTENSIONS" and the description of "->filter"
521           for details on which tags are handled how.
522
523       anything else
524           Anything else (e.g. unsupported simple values) will raise a
525           decoding error.
526
527   PERL -> CBOR
528       The mapping from Perl to CBOR is slightly more difficult, as Perl is a
529       typeless language. That means this module can only guess which CBOR
530       type is meant by a perl value.
531
532       hash references
533           Perl hash references become CBOR maps. As there is no inherent
534           ordering in hash keys (or CBOR maps), they will usually be encoded
535           in a pseudo-random order. This order can be different each time a
536           hash is encoded.
537
538           Currently, tied hashes will use the indefinite-length format, while
539           normal hashes will use the fixed-length format.
540
541       array references
542           Perl array references become fixed-length CBOR arrays.
543
544       other references
545           Other unblessed references will be represented using the
546           indirection tag extension (tag value 22098,
547           <http://cbor.schmorp.de/indirection>). CBOR decoders are guaranteed
548           to be able to decode these values somehow, by either "doing the
549           right thing", decoding into a generic tagged object, simply
550           ignoring the tag, or something else.
551
552       CBOR::XS::Tagged objects
553           Objects of this type must be arrays consisting of a single "[tag,
554           value]" pair. The (numerical) tag will be encoded as a CBOR tag,
555           the value will be encoded as appropriate for the value. You must
556           use "CBOR::XS::tag" to create such objects.
557
558       Types::Serialiser::true, Types::Serialiser::false,
559       Types::Serialiser::error
560           These special values become CBOR true, CBOR false and CBOR
561           undefined values, respectively. You can also use "\1", "\0" and
562           "\undef" directly if you want.
563
564       other blessed objects
565           Other blessed objects are serialised via "TO_CBOR" or "FREEZE". See
566           "TAG HANDLING AND EXTENSIONS" for specific classes handled by this
567           module, and "OBJECT SERIALISATION" for generic object
568           serialisation.
569
570       simple scalars
571           Simple Perl scalars (any scalar that is not a reference) are the
572           most difficult objects to encode: CBOR::XS will encode undefined
573           scalars as CBOR null values, scalars that have last been used in a
574           string context before encoding as CBOR strings, and anything else
575           as number value:
576
577              # dump as number
578              encode_cbor [2]                      # yields [2]
579              encode_cbor [-3.0e17]                # yields [-3e+17]
580              my $value = 5; encode_cbor [$value]  # yields [5]
581
582              # used as string, so dump as string (either byte or text)
583              print $value;
584              encode_cbor [$value]                 # yields ["5"]
585
586              # undef becomes null
587              encode_cbor [undef]                  # yields [null]
588
589           You can force the type to be a CBOR string by stringifying it:
590
591              my $x = 3.1; # some variable containing a number
592              "$x";        # stringified
593              $x .= "";    # another, more awkward way to stringify
594              print $x;    # perl does it for you, too, quite often
595
596           You can force whether a string is encoded as byte or text string by
597           using "utf8::upgrade" and "utf8::downgrade" (if "text_strings" is
598           disabled).
599
600             utf8::upgrade $x;   # encode $x as text string
601             utf8::downgrade $x; # encode $x as byte string
602
603           More options are available, see "TYPE CASTS", below, and the
604           "text_keys" and "text_strings" options.
605
606           Perl doesn't define what operations up- and downgrade strings, so
607           if the difference between byte and text is important, you should
608           up- or downgrade your string as late as possible before encoding.
609           You can also force the use of CBOR text strings by using
610           "text_keys" or "text_strings".
611
612           You can force the type to be a CBOR number by numifying it:
613
614              my $x = "3"; # some variable containing a string
615              $x += 0;     # numify it, ensuring it will be dumped as a number
616              $x *= 1;     # same thing, the choice is yours.
617
618           You can not currently force the type in other, less obscure, ways.
619           Tell me if you need this capability (but don't forget to explain
620           why it's needed :).
621
622           Perl values that seem to be integers generally use the shortest
623           possible representation. Floating-point values will use either the
624           IEEE single format if possible without loss of precision, otherwise
625           the IEEE double format will be used. Perls that use formats other
626           than IEEE double to represent numerical values are supported, but
627           might suffer loss of precision.
628
629   TYPE CASTS
630       EXPERIMENTAL: As an experimental extension, "CBOR::XS" allows you to
631       force specific CBOR types to be used when encoding. That allows you to
632       encode types not normally accessible (e.g. half floats) as well as
633       force string types even when "text_strings" is in effect.
634
635       Type forcing is done by calling a special "cast" function which keeps a
636       copy of the value and returns a new value that can be handed over to
637       any CBOR encoder function.
638
639       The following casts are currently available (all of which are unary
640       operators, that is, have a prototype of "$"):
641
642       CBOR::XS::as_int $value
643           Forces the value to be encoded as some form of (basic, not bignum)
644           integer type.
645
646       CBOR::XS::as_text $value
647           Forces the value to be encoded as (UTF-8) text values.
648
649       CBOR::XS::as_bytes $value
650           Forces the value to be encoded as a (binary) string value.
651
652           Example: encode a perl string as binary even though "text_strings"
653           is in effect.
654
655              CBOR::XS->new->text_strings->encode ([4, "text", CBOR::XS::bytes "bytevalue"]);
656
657       CBOR::XS::as_bool $value
658           Converts a Perl boolean (which can be any kind of scalar) into a
659           CBOR boolean. Strictly the same, but shorter to write, than:
660
661              $value ? Types::Serialiser::true : Types::Serialiser::false
662
663       CBOR::XS::as_float16 $value
664           Forces half-float (IEEE 754 binary16) encoding of the given value.
665
666       CBOR::XS::as_float32 $value
667           Forces single-float (IEEE 754 binary32) encoding of the given
668           value.
669
670       CBOR::XS::as_float64 $value
671           Forces double-float (IEEE 754 binary64) encoding of the given
672           value.
673
674       CBOR::XS::as_cbor $cbor_text
675           Not a type cast per-se, this type cast forces the argument to be
676           encoded as-is. This can be used to embed pre-encoded CBOR data.
677
678           Note that no checking on the validity of the $cbor_text is done -
679           it's the callers responsibility to correctly encode values.
680
681       CBOR::XS::as_map [key => value...]
682           Treat the array reference as key value pairs and output a CBOR map.
683           This allows you to generate CBOR maps with arbitrary key types (or,
684           if you don't care about semantics, duplicate keys or pairs in a
685           custom order), which is otherwise hard to do with Perl.
686
687           The single argument must be an array reference with an even number
688           of elements.
689
690           Note that only the reference to the array is copied, the array
691           itself is not. Modifications done to the array before calling an
692           encoding function will be reflected in the encoded output.
693
694           Example: encode a CBOR map with a string and an integer as keys.
695
696              encode_cbor CBOR::XS::as_map [string => "value", 5 => "value"]
697
698   OBJECT SERIALISATION
699       This module implements both a CBOR-specific and the generic
700       Types::Serialier object serialisation protocol. The following
701       subsections explain both methods.
702
703       ENCODING
704
705       This module knows two way to serialise a Perl object: The CBOR-specific
706       way, and the generic way.
707
708       Whenever the encoder encounters a Perl object that it cannot serialise
709       directly (most of them), it will first look up the "TO_CBOR" method on
710       it.
711
712       If it has a "TO_CBOR" method, it will call it with the object as only
713       argument, and expects exactly one return value, which it will then
714       substitute and encode it in the place of the object.
715
716       Otherwise, it will look up the "FREEZE" method. If it exists, it will
717       call it with the object as first argument, and the constant string
718       "CBOR" as the second argument, to distinguish it from other
719       serialisers.
720
721       The "FREEZE" method can return any number of values (i.e. zero or
722       more). These will be encoded as CBOR perl object, together with the
723       classname.
724
725       These methods MUST NOT change the data structure that is being
726       serialised. Failure to comply to this can result in memory corruption -
727       and worse.
728
729       If an object supports neither "TO_CBOR" nor "FREEZE", encoding will
730       fail with an error.
731
732       DECODING
733
734       Objects encoded via "TO_CBOR" cannot (normally) be automatically
735       decoded, but objects encoded via "FREEZE" can be decoded using the
736       following protocol:
737
738       When an encoded CBOR perl object is encountered by the decoder, it will
739       look up the "THAW" method, by using the stored classname, and will fail
740       if the method cannot be found.
741
742       After the lookup it will call the "THAW" method with the stored
743       classname as first argument, the constant string "CBOR" as second
744       argument, and all values returned by "FREEZE" as remaining arguments.
745
746       EXAMPLES
747
748       Here is an example "TO_CBOR" method:
749
750          sub My::Object::TO_CBOR {
751             my ($obj) = @_;
752
753             ["this is a serialised My::Object object", $obj->{id}]
754          }
755
756       When a "My::Object" is encoded to CBOR, it will instead encode a simple
757       array with two members: a string, and the "object id". Decoding this
758       CBOR string will yield a normal perl array reference in place of the
759       object.
760
761       A more useful and practical example would be a serialisation method for
762       the URI module. CBOR has a custom tag value for URIs, namely 32:
763
764         sub URI::TO_CBOR {
765            my ($self) = @_;
766            my $uri = "$self"; # stringify uri
767            utf8::upgrade $uri; # make sure it will be encoded as UTF-8 string
768            CBOR::XS::tag 32, "$_[0]"
769         }
770
771       This will encode URIs as a UTF-8 string with tag 32, which indicates an
772       URI.
773
774       Decoding such an URI will not (currently) give you an URI object, but
775       instead a CBOR::XS::Tagged object with tag number 32 and the string -
776       exactly what was returned by "TO_CBOR".
777
778       To serialise an object so it can automatically be deserialised, you
779       need to use "FREEZE" and "THAW". To take the URI module as example,
780       this would be a possible implementation:
781
782          sub URI::FREEZE {
783             my ($self, $serialiser) = @_;
784             "$self" # encode url string
785          }
786
787          sub URI::THAW {
788             my ($class, $serialiser, $uri) = @_;
789             $class->new ($uri)
790          }
791
792       Unlike "TO_CBOR", multiple values can be returned by "FREEZE". For
793       example, a "FREEZE" method that returns "type", "id" and "variant"
794       values would cause an invocation of "THAW" with 5 arguments:
795
796          sub My::Object::FREEZE {
797             my ($self, $serialiser) = @_;
798
799             ($self->{type}, $self->{id}, $self->{variant})
800          }
801
802          sub My::Object::THAW {
803             my ($class, $serialiser, $type, $id, $variant) = @_;
804
805             $class-<new (type => $type, id => $id, variant => $variant)
806          }
807

MAGIC HEADER

809       There is no way to distinguish CBOR from other formats
810       programmatically. To make it easier to distinguish CBOR from other
811       formats, the CBOR specification has a special "magic string" that can
812       be prepended to any CBOR string without changing its meaning.
813
814       This string is available as $CBOR::XS::MAGIC. This module does not
815       prepend this string to the CBOR data it generates, but it will ignore
816       it if present, so users can prepend this string as a "file type"
817       indicator as required.
818

THE CBOR::XS::Tagged CLASS

820       CBOR has the concept of tagged values - any CBOR value can be tagged
821       with a numeric 64 bit number, which are centrally administered.
822
823       "CBOR::XS" handles a few tags internally when en- or decoding. You can
824       also create tags yourself by encoding "CBOR::XS::Tagged" objects, and
825       the decoder will create "CBOR::XS::Tagged" objects itself when it hits
826       an unknown tag.
827
828       These objects are simply blessed array references - the first member of
829       the array being the numerical tag, the second being the value.
830
831       You can interact with "CBOR::XS::Tagged" objects in the following ways:
832
833       $tagged = CBOR::XS::tag $tag, $value
834           This function(!) creates a new "CBOR::XS::Tagged" object using the
835           given $tag (0..2**64-1) to tag the given $value (which can be any
836           Perl value that can be encoded in CBOR, including serialisable Perl
837           objects and "CBOR::XS::Tagged" objects).
838
839       $tagged->[0]
840       $tagged->[0] = $new_tag
841       $tag = $tagged->tag
842       $new_tag = $tagged->tag ($new_tag)
843           Access/mutate the tag.
844
845       $tagged->[1]
846       $tagged->[1] = $new_value
847       $value = $tagged->value
848       $new_value = $tagged->value ($new_value)
849           Access/mutate the tagged value.
850
851   EXAMPLES
852       Here are some examples of "CBOR::XS::Tagged" uses to tag objects.
853
854       You can look up CBOR tag value and emanings in the IANA registry at
855       <http://www.iana.org/assignments/cbor-tags/cbor-tags.xhtml>.
856
857       Prepend a magic header ($CBOR::XS::MAGIC):
858
859          my $cbor = encode_cbor CBOR::XS::tag 55799, $value;
860          # same as:
861          my $cbor = $CBOR::XS::MAGIC . encode_cbor $value;
862
863       Serialise some URIs and a regex in an array:
864
865          my $cbor = encode_cbor [
866             (CBOR::XS::tag 32, "http://www.nethype.de/"),
867             (CBOR::XS::tag 32, "http://software.schmorp.de/"),
868             (CBOR::XS::tag 35, "^[Pp][Ee][Rr][lL]\$"),
869          ];
870
871       Wrap CBOR data in CBOR:
872
873          my $cbor_cbor = encode_cbor
874             CBOR::XS::tag 24,
875                encode_cbor [1, 2, 3];
876

TAG HANDLING AND EXTENSIONS

878       This section describes how this module handles specific tagged values
879       and extensions. If a tag is not mentioned here and no additional
880       filters are provided for it, then the default handling applies
881       (creating a CBOR::XS::Tagged object on decoding, and only encoding the
882       tag when explicitly requested).
883
884       Tags not handled specifically are currently converted into a
885       CBOR::XS::Tagged object, which is simply a blessed array reference
886       consisting of the numeric tag value followed by the (decoded) CBOR
887       value.
888
889       Future versions of this module reserve the right to special case
890       additional tags (such as base64url).
891
892   ENFORCED TAGS
893       These tags are always handled when decoding, and their handling cannot
894       be overridden by the user.
895
896       26 (perl-object, <http://cbor.schmorp.de/perl-object>)
897           These tags are automatically created (and decoded) for serialisable
898           objects using the "FREEZE/THAW" methods (the Types::Serialier
899           object serialisation protocol). See "OBJECT SERIALISATION" for
900           details.
901
902       28, 29 (shareable, sharedref, <http://cbor.schmorp.de/value-sharing>)
903           These tags are automatically decoded when encountered (and they do
904           not result in a cyclic data structure, see "allow_cycles"),
905           resulting in shared values in the decoded object. They are only
906           encoded, however, when "allow_sharing" is enabled.
907
908           Not all shared values can be successfully decoded: values that
909           reference themselves will currently decode as "undef" (this is not
910           the same as a reference pointing to itself, which will be
911           represented as a value that contains an indirect reference to
912           itself - these will be decoded properly).
913
914           Note that considerably more shared value data structures can be
915           decoded than will be encoded - currently, only values pointed to by
916           references will be shared, others will not. While non-reference
917           shared values can be generated in Perl with some effort, they were
918           considered too unimportant to be supported in the encoder. The
919           decoder, however, will decode these values as shared values.
920
921       256, 25 (stringref-namespace, stringref,
922       <http://cbor.schmorp.de/stringref>)
923           These tags are automatically decoded when encountered. They are
924           only encoded, however, when "pack_strings" is enabled.
925
926       22098 (indirection, <http://cbor.schmorp.de/indirection>)
927           This tag is automatically generated when a reference are
928           encountered (with the exception of hash and array references). It
929           is converted to a reference when decoding.
930
931       55799 (self-describe CBOR, RFC 7049)
932           This value is not generated on encoding (unless explicitly
933           requested by the user), and is simply ignored when decoding.
934
935   NON-ENFORCED TAGS
936       These tags have default filters provided when decoding. Their handling
937       can be overridden by changing the %CBOR::XS::FILTER entry for the tag,
938       or by providing a custom "filter" callback when decoding.
939
940       When they result in decoding into a specific Perl class, the module
941       usually provides a corresponding "TO_CBOR" method as well.
942
943       When any of these need to load additional modules that are not part of
944       the perl core distribution (e.g. URI), it is (currently) up to the user
945       to provide these modules. The decoding usually fails with an exception
946       if the required module cannot be loaded.
947
948       0, 1 (date/time string, seconds since the epoch)
949           These tags are decoded into Time::Piece objects. The corresponding
950           "Time::Piece::TO_CBOR" method always encodes into tag 1 values
951           currently.
952
953           The Time::Piece API is generally surprisingly bad, and fractional
954           seconds are only accidentally kept intact, so watch out. On the
955           plus side, the module comes with perl since 5.10, which has to
956           count for something.
957
958       2, 3 (positive/negative bignum)
959           These tags are decoded into Math::BigInt objects. The corresponding
960           "Math::BigInt::TO_CBOR" method encodes "small" bigints into normal
961           CBOR integers, and others into positive/negative CBOR bignums.
962
963       4, 5, 264, 265 (decimal fraction/bigfloat)
964           Both decimal fractions and bigfloats are decoded into
965           Math::BigFloat objects. The corresponding "Math::BigFloat::TO_CBOR"
966           method always encodes into a decimal fraction (either tag 4 or
967           264).
968
969           NaN and infinities are not encoded properly, as they cannot be
970           represented in CBOR.
971
972           See "BIGNUM SECURITY CONSIDERATIONS" for more info.
973
974       30 (rational numbers)
975           These tags are decoded into Math::BigRat objects. The corresponding
976           "Math::BigRat::TO_CBOR" method encodes rational numbers with
977           denominator 1 via their numerator only, i.e., they become normal
978           integers or "bignums".
979
980           See "BIGNUM SECURITY CONSIDERATIONS" for more info.
981
982       21, 22, 23 (expected later JSON conversion)
983           CBOR::XS is not a CBOR-to-JSON converter, and will simply ignore
984           these tags.
985
986       32 (URI)
987           These objects decode into URI objects. The corresponding
988           "URI::TO_CBOR" method again results in a CBOR URI value.
989

CBOR and JSON

991       CBOR is supposed to implement a superset of the JSON data model, and
992       is, with some coercion, able to represent all JSON texts (something
993       that other "binary JSON" formats such as BSON generally do not
994       support).
995
996       CBOR implements some extra hints and support for JSON interoperability,
997       and the spec offers further guidance for conversion between CBOR and
998       JSON. None of this is currently implemented in CBOR, and the guidelines
999       in the spec do not result in correct round-tripping of data. If JSON
1000       interoperability is improved in the future, then the goal will be to
1001       ensure that decoded JSON data will round-trip encoding and decoding to
1002       CBOR intact.
1003

SECURITY CONSIDERATIONS

1005       Tl;dr... if you want to decode or encode CBOR from untrusted sources,
1006       you should start with a coder object created via "new_safe" (which
1007       implements the mitigations explained below):
1008
1009          my $coder = CBOR::XS->new_safe;
1010
1011          my $data = $coder->decode ($cbor_text);
1012          my $cbor = $coder->encode ($data);
1013
1014       Longer version: When you are using CBOR in a protocol, talking to
1015       untrusted potentially hostile creatures requires some thought:
1016
1017       Security of the CBOR decoder itself
1018           First and foremost, your CBOR decoder should be secure, that is,
1019           should not have any buffer overflows or similar bugs that could
1020           potentially be exploited. Obviously, this module should ensure that
1021           and I am trying hard on making that true, but you never know.
1022
1023       CBOR::XS can invoke almost arbitrary callbacks during decoding
1024           CBOR::XS supports object serialisation - decoding CBOR can cause
1025           calls to any "THAW" method in any package that exists in your
1026           process (that is, CBOR::XS will not try to load modules, but any
1027           existing "THAW" method or function can be called, so they all have
1028           to be secure).
1029
1030           Less obviously, it will also invoke "TO_CBOR" and "FREEZE" methods
1031           - even if all your "THAW" methods are secure, encoding data
1032           structures from untrusted sources can invoke those and trigger bugs
1033           in those.
1034
1035           So, if you are not sure about the security of all the modules you
1036           have loaded (you shouldn't), you should disable this part using
1037           "forbid_objects" or using "new_safe".
1038
1039       CBOR can be extended with tags that call library code
1040           CBOR can be extended with tags, and "CBOR::XS" has a registry of
1041           conversion functions for many existing tags that can be extended
1042           via third-party modules (see the "filter" method).
1043
1044           If you don't trust these, you should configure the "safe" filter
1045           function, "CBOR::XS::safe_filter" ("new_safe" does this), which by
1046           default only includes conversion functions that are considered
1047           "safe" by the author (but again, they can be extended by third
1048           party modules).
1049
1050           Depending on your level of paranoia, you can use the "safe" filter:
1051
1052              $cbor->filter (\&CBOR::XS::safe_filter);
1053
1054           ... your own filter...
1055
1056              $cbor->filter (sub { ... do your stuffs here ... });
1057
1058           ... or even no filter at all, disabling all tag decoding:
1059
1060              $cbor->filter (sub { });
1061
1062           This is never a problem for encoding, as the tag mechanism only
1063           exists in CBOR texts.
1064
1065       Resource-starving attacks: object memory usage
1066           You need to avoid resource-starving attacks. That means you should
1067           limit the size of CBOR data you accept, or make sure then when your
1068           resources run out, that's just fine (e.g. by using a separate
1069           process that can crash safely). The size of a CBOR string in octets
1070           is usually a good indication of the size of the resources required
1071           to decode it into a Perl structure. While CBOR::XS can check the
1072           size of the CBOR text (using "max_size" - done by "new_safe"), it
1073           might be too late when you already have it in memory, so you might
1074           want to check the size before you accept the string.
1075
1076           As for encoding, it is possible to construct data structures that
1077           are relatively small but result in large CBOR texts (for example by
1078           having an array full of references to the same big data structure,
1079           which will all be deep-cloned during encoding by default). This is
1080           rarely an actual issue (and the worst case is still just running
1081           out of memory), but you can reduce this risk by using
1082           "allow_sharing".
1083
1084       Resource-starving attacks: stack overflows
1085           CBOR::XS recurses using the C stack when decoding objects and
1086           arrays. The C stack is a limited resource: for instance, on my
1087           amd64 machine with 8MB of stack size I can decode around 180k
1088           nested arrays but only 14k nested CBOR objects (due to perl itself
1089           recursing deeply on croak to free the temporary). If that is
1090           exceeded, the program crashes. To be conservative, the default
1091           nesting limit is set to 512. If your process has a smaller stack,
1092           you should adjust this setting accordingly with the "max_depth"
1093           method.
1094
1095       Resource-starving attacks: CPU en-/decoding complexity
1096           CBOR::XS will use the Math::BigInt, Math::BigFloat and Math::BigRat
1097           libraries to represent encode/decode bignums. These can be very
1098           slow (as in, centuries of CPU time) and can even crash your program
1099           (and are generally not very trustworthy). See the next section on
1100           bignum security for details.
1101
1102       Data breaches: leaking information in error messages
1103           CBOR::XS might leak contents of your Perl data structures in its
1104           error messages, so when you serialise sensitive information you
1105           might want to make sure that exceptions thrown by CBOR::XS will not
1106           end up in front of untrusted eyes.
1107
1108       Something else...
1109           Something else could bomb you, too, that I forgot to think of. In
1110           that case, you get to keep the pieces. I am always open for hints,
1111           though...
1112

BIGNUM SECURITY CONSIDERATIONS

1114       CBOR::XS provides a "TO_CBOR" method for both Math::BigInt and
1115       Math::BigFloat that tries to encode the number in the simplest possible
1116       way, that is, either a CBOR integer, a CBOR bigint/decimal fraction
1117       (tag 4) or an arbitrary-exponent decimal fraction (tag 264). Rational
1118       numbers (Math::BigRat, tag 30) can also contain bignums as members.
1119
1120       CBOR::XS will also understand base-2 bigfloat or arbitrary-exponent
1121       bigfloats (tags 5 and 265), but it will never generate these on its
1122       own.
1123
1124       Using the built-in Math::BigInt::Calc support, encoding and decoding
1125       decimal fractions is generally fast. Decoding bigints can be slow for
1126       very big numbers (tens of thousands of digits, something that could
1127       potentially be caught by limiting the size of CBOR texts), and decoding
1128       bigfloats or arbitrary-exponent bigfloats can be extremely slow
1129       (minutes, decades) for large exponents (roughly 40 bit and longer).
1130
1131       Additionally, Math::BigInt can take advantage of other bignum
1132       libraries, such as Math::GMP, which cannot handle big floats with large
1133       exponents, and might simply abort or crash your program, due to their
1134       code quality.
1135
1136       This can be a concern if you want to parse untrusted CBOR. If it is,
1137       you might want to disable decoding of tag 2 (bigint) and 3 (negative
1138       bigint) types. You should also disable types 5 and 265, as these can be
1139       slow even without bigints.
1140
1141       Disabling bigints will also partially or fully disable types that rely
1142       on them, e.g. rational numbers that use bignums.
1143

CBOR IMPLEMENTATION NOTES

1145       This section contains some random implementation notes. They do not
1146       describe guaranteed behaviour, but merely behaviour as-is implemented
1147       right now.
1148
1149       64 bit integers are only properly decoded when Perl was built with 64
1150       bit support.
1151
1152       Strings and arrays are encoded with a definite length. Hashes as well,
1153       unless they are tied (or otherwise magical).
1154
1155       Only the double data type is supported for NV data types - when Perl
1156       uses long double to represent floating point values, they might not be
1157       encoded properly. Half precision types are accepted, but not encoded.
1158
1159       Strict mode and canonical mode are not implemented.
1160

LIMITATIONS ON PERLS WITHOUT 64-BIT INTEGER SUPPORT

1162       On perls that were built without 64 bit integer support (these are rare
1163       nowadays, even on 32 bit architectures, as all major Perl distributions
1164       are built with 64 bit integer support), support for any kind of 64 bit
1165       value in CBOR is very limited - most likely, these 64 bit values will
1166       be truncated, corrupted, or otherwise not decoded correctly. This also
1167       includes string, float, array and map sizes that are stored as 64 bit
1168       integers.
1169

THREADS

1171       This module is not guaranteed to be thread safe and there are no plans
1172       to change this until Perl gets thread support (as opposed to the
1173       horribly slow so-called "threads" which are simply slow and bloated
1174       process simulations - use fork, it's much faster, cheaper, better).
1175
1176       (It might actually work, but you have been warned).
1177

BUGS

1179       While the goal of this module is to be correct, that unfortunately does
1180       not mean it's bug-free, only that I think its design is bug-free. If
1181       you keep reporting bugs they will be fixed swiftly, though.
1182
1183       Please refrain from using rt.cpan.org or any other bug reporting
1184       service. I put the contact address into my modules for a reason.
1185

AUTHOR

1194        Marc Lehmann <schmorp@schmorp.de>
1195        http://home.schmorp.de/
1196
1197
1198
1199perl v5.36.0                      2023-01-20                             XS(3)