1XS(3) User Contributed Perl Documentation XS(3)
2
3
4
6 CBOR::XS - Concise Binary Object Representation (CBOR, RFC7049)
7
9 use CBOR::XS;
10
11 $binary_cbor_data = encode_cbor $perl_value;
12 $perl_value = decode_cbor $binary_cbor_data;
13
14 # OO-interface
15
16 $coder = CBOR::XS->new;
17 $binary_cbor_data = $coder->encode ($perl_value);
18 $perl_value = $coder->decode ($binary_cbor_data);
19
20 # prefix decoding
21
22 my $many_cbor_strings = ...;
23 while (length $many_cbor_strings) {
24 my ($data, $length) = $cbor->decode_prefix ($many_cbor_strings);
25 # data was decoded
26 substr $many_cbor_strings, 0, $length, ""; # remove decoded cbor string
27 }
28
30 This module converts Perl data structures to the Concise Binary Object
31 Representation (CBOR) and vice versa. CBOR is a fast binary
32 serialisation format that aims to use an (almost) superset of the JSON
33 data model, i.e. when you can represent something useful in JSON, you
34 should be able to represent it in CBOR.
35
36 In short, CBOR is a faster and quite compact binary alternative to
37 JSON, with the added ability of supporting serialisation of Perl
38 objects. (JSON often compresses better than CBOR though, so if you plan
39 to compress the data later and speed is less important you might want
40 to compare both formats first).
41
42 The primary goal of this module is to be correct and the secondary goal
43 is to be fast. To reach the latter goal it was written in C.
44
45 To give you a general idea about speed, with texts in the megabyte
46 range, "CBOR::XS" usually encodes roughly twice as fast as Storable or
47 JSON::XS and decodes about 15%-30% faster than those. The shorter the
48 data, the worse Storable performs in comparison.
49
50 Regarding compactness, "CBOR::XS"-encoded data structures are usually
51 about 20% smaller than the same data encoded as (compact) JSON or
52 Storable.
53
54 In addition to the core CBOR data format, this module implements a
55 number of extensions, to support cyclic and shared data structures (see
56 "allow_sharing" and "allow_cycles"), string deduplication (see
57 "pack_strings") and scalar references (always enabled).
58
59 See MAPPING, below, on how CBOR::XS maps perl values to CBOR values and
60 vice versa.
61
63 The following convenience methods are provided by this module. They are
64 exported by default:
65
66 $cbor_data = encode_cbor $perl_scalar
67 Converts the given Perl data structure to CBOR representation.
68 Croaks on error.
69
70 $perl_scalar = decode_cbor $cbor_data
71 The opposite of "encode_cbor": expects a valid CBOR string to
72 parse, returning the resulting perl scalar. Croaks on error.
73
75 The object oriented interface lets you configure your own encoding or
76 decoding style, within the limits of supported formats.
77
78 $cbor = new CBOR::XS
79 Creates a new CBOR::XS object that can be used to de/encode CBOR
80 strings. All boolean flags described below are by default disabled.
81
82 The mutators for flags all return the CBOR object again and thus
83 calls can be chained:
84
85 my $cbor = CBOR::XS->new->encode ({a => [1,2]});
86
87 $cbor = new_safe CBOR::XS
88 Create a new, safe/secure CBOR::XS object. This is similar to
89 "new", but configures the coder object to be safe to use with
90 untrusted data. Currently, this is equivalent to:
91
92 my $cbor = CBOR::XS
93 ->new
94 ->validate_utf8
95 ->forbid_objects
96 ->filter (\&CBOR::XS::safe_filter)
97 ->max_size (1e8);
98
99 But is more future proof (it is better to crash because of a change
100 than to be exploited in other ways).
101
102 $cbor = $cbor->max_depth ([$maximum_nesting_depth])
103 $max_depth = $cbor->get_max_depth
104 Sets the maximum nesting level (default 512) accepted while
105 encoding or decoding. If a higher nesting level is detected in CBOR
106 data or a Perl data structure, then the encoder and decoder will
107 stop and croak at that point.
108
109 Nesting level is defined by number of hash- or arrayrefs that the
110 encoder needs to traverse to reach a given point or the number of
111 "{" or "[" characters without their matching closing parenthesis
112 crossed to reach a given character in a string.
113
114 Setting the maximum depth to one disallows any nesting, so that
115 ensures that the object is only a single hash/object or array.
116
117 If no argument is given, the highest possible setting will be used,
118 which is rarely useful.
119
120 Note that nesting is implemented by recursion in C. The default
121 value has been chosen to be as large as typical operating systems
122 allow without crashing.
123
124 See "SECURITY CONSIDERATIONS", below, for more info on why this is
125 useful.
126
127 $cbor = $cbor->max_size ([$maximum_string_size])
128 $max_size = $cbor->get_max_size
129 Set the maximum length a CBOR string may have (in bytes) where
130 decoding is being attempted. The default is 0, meaning no limit.
131 When "decode" is called on a string that is longer then this many
132 bytes, it will not attempt to decode the string but throw an
133 exception. This setting has no effect on "encode" (yet).
134
135 If no argument is given, the limit check will be deactivated (same
136 as when 0 is specified).
137
138 See "SECURITY CONSIDERATIONS", below, for more info on why this is
139 useful.
140
141 $cbor = $cbor->allow_unknown ([$enable])
142 $enabled = $cbor->get_allow_unknown
143 If $enable is true (or missing), then "encode" will not throw an
144 exception when it encounters values it cannot represent in CBOR
145 (for example, filehandles) but instead will encode a CBOR "error"
146 value.
147
148 If $enable is false (the default), then "encode" will throw an
149 exception when it encounters anything it cannot encode as CBOR.
150
151 This option does not affect "decode" in any way, and it is
152 recommended to leave it off unless you know your communications
153 partner.
154
155 $cbor = $cbor->allow_sharing ([$enable])
156 $enabled = $cbor->get_allow_sharing
157 If $enable is true (or missing), then "encode" will not double-
158 encode values that have been referenced before (e.g. when the same
159 object, such as an array, is referenced multiple times), but
160 instead will emit a reference to the earlier value.
161
162 This means that such values will only be encoded once, and will not
163 result in a deep cloning of the value on decode, in decoders
164 supporting the value sharing extension. This also makes it possible
165 to encode cyclic data structures (which need "allow_cycles" to be
166 enabled to be decoded by this module).
167
168 It is recommended to leave it off unless you know your
169 communication partner supports the value sharing extensions to CBOR
170 (<http://cbor.schmorp.de/value-sharing>), as without decoder
171 support, the resulting data structure might be unusable.
172
173 Detecting shared values incurs a runtime overhead when values are
174 encoded that have a reference counter large than one, and might
175 unnecessarily increase the encoded size, as potentially shared
176 values are encoded as shareable whether or not they are actually
177 shared.
178
179 At the moment, only targets of references can be shared (e.g.
180 scalars, arrays or hashes pointed to by a reference). Weirder
181 constructs, such as an array with multiple "copies" of the same
182 string, which are hard but not impossible to create in Perl, are
183 not supported (this is the same as with Storable).
184
185 If $enable is false (the default), then "encode" will encode shared
186 data structures repeatedly, unsharing them in the process. Cyclic
187 data structures cannot be encoded in this mode.
188
189 This option does not affect "decode" in any way - shared values and
190 references will always be decoded properly if present.
191
192 $cbor = $cbor->allow_cycles ([$enable])
193 $enabled = $cbor->get_allow_cycles
194 If $enable is true (or missing), then "decode" will happily decode
195 self-referential (cyclic) data structures. By default these will
196 not be decoded, as they need manual cleanup to avoid memory leaks,
197 so code that isn't prepared for this will not leak memory.
198
199 If $enable is false (the default), then "decode" will throw an
200 error when it encounters a self-referential/cyclic data structure.
201
202 FUTURE DIRECTION: the motivation behind this option is to avoid
203 real cycles - future versions of this module might chose to decode
204 cyclic data structures using weak references when this option is
205 off, instead of throwing an error.
206
207 This option does not affect "encode" in any way - shared values and
208 references will always be encoded properly if present.
209
210 $cbor = $cbor->forbid_objects ([$enable])
211 $enabled = $cbor->get_forbid_objects
212 Disables the use of the object serialiser protocol.
213
214 If $enable is true (or missing), then "encode" will will throw an
215 exception when it encounters perl objects that would be encoded
216 using the perl-object tag (26). When "decode" encounters such tags,
217 it will fall back to the general filter/tagged logic as if this
218 were an unknown tag (by default resulting in a "CBOR::XC::Tagged"
219 object).
220
221 If $enable is false (the default), then "encode" will use the
222 Types::Serialiser object serialisation protocol to serialise
223 objects into perl-object tags, and "decode" will do the same to
224 decode such tags.
225
226 See "SECURITY CONSIDERATIONS", below, for more info on why
227 forbidding this protocol can be useful.
228
229 $cbor = $cbor->pack_strings ([$enable])
230 $enabled = $cbor->get_pack_strings
231 If $enable is true (or missing), then "encode" will try not to
232 encode the same string twice, but will instead encode a reference
233 to the string instead. Depending on your data format, this can save
234 a lot of space, but also results in a very large runtime overhead
235 (expect encoding times to be 2-4 times as high as without).
236
237 It is recommended to leave it off unless you know your
238 communications partner supports the stringref extension to CBOR
239 (<http://cbor.schmorp.de/stringref>), as without decoder support,
240 the resulting data structure might not be usable.
241
242 If $enable is false (the default), then "encode" will encode
243 strings the standard CBOR way.
244
245 This option does not affect "decode" in any way - string references
246 will always be decoded properly if present.
247
248 $cbor = $cbor->text_keys ([$enable])
249 $enabled = $cbor->get_text_keys
250 If $enabled is true (or missing), then "encode" will encode all
251 perl hash keys as CBOR text strings/UTF-8 string, upgrading them as
252 needed.
253
254 If $enable is false (the default), then "encode" will encode hash
255 keys normally - upgraded perl strings (strings internally encoded
256 as UTF-8) as CBOR text strings, and downgraded perl strings as CBOR
257 byte strings.
258
259 This option does not affect "decode" in any way.
260
261 This option is useful for interoperability with CBOR decoders that
262 don't treat byte strings as a form of text. It is especially useful
263 as Perl gives very little control over hash keys.
264
265 Enabling this option can be slow, as all downgraded hash keys that
266 are encoded need to be scanned and converted to UTF-8.
267
268 $cbor = $cbor->text_strings ([$enable])
269 $enabled = $cbor->get_text_strings
270 This option works similar to "text_keys", above, but works on all
271 strings (including hash keys), so "text_keys" has no further effect
272 after enabling "text_strings".
273
274 If $enabled is true (or missing), then "encode" will encode all
275 perl strings as CBOR text strings/UTF-8 strings, upgrading them as
276 needed.
277
278 If $enable is false (the default), then "encode" will encode
279 strings normally (but see "text_keys") - upgraded perl strings
280 (strings internally encoded as UTF-8) as CBOR text strings, and
281 downgraded perl strings as CBOR byte strings.
282
283 This option does not affect "decode" in any way.
284
285 This option has similar advantages and disadvantages as
286 "text_keys". In addition, this option effectively removes the
287 ability to automatically encode byte strings, which might break
288 some "FREEZE" and "TO_CBOR" methods that rely on this.
289
290 A workaround is to use explicit type casts, which are unaffected by
291 this option.
292
293 $cbor = $cbor->validate_utf8 ([$enable])
294 $enabled = $cbor->get_validate_utf8
295 If $enable is true (or missing), then "decode" will validate that
296 elements (text strings) containing UTF-8 data in fact contain valid
297 UTF-8 data (instead of blindly accepting it). This validation
298 obviously takes extra time during decoding.
299
300 The concept of "valid UTF-8" used is perl's concept, which is a
301 superset of the official UTF-8.
302
303 If $enable is false (the default), then "decode" will blindly
304 accept UTF-8 data, marking them as valid UTF-8 in the resulting
305 data structure regardless of whether that's true or not.
306
307 Perl isn't too happy about corrupted UTF-8 in strings, but should
308 generally not crash or do similarly evil things. Extensions might
309 be not so forgiving, so it's recommended to turn on this setting if
310 you receive untrusted CBOR.
311
312 This option does not affect "encode" in any way - strings that are
313 supposedly valid UTF-8 will simply be dumped into the resulting
314 CBOR string without checking whether that is, in fact, true or not.
315
316 $cbor = $cbor->filter ([$cb->($tag, $value)])
317 $cb_or_undef = $cbor->get_filter
318 Sets or replaces the tagged value decoding filter (when $cb is
319 specified) or clears the filter (if no argument or "undef" is
320 provided).
321
322 The filter callback is called only during decoding, when a non-
323 enforced tagged value has been decoded (see "TAG HANDLING AND
324 EXTENSIONS" for a list of enforced tags). For specific tags, it's
325 often better to provide a default converter using the
326 %CBOR::XS::FILTER hash (see below).
327
328 The first argument is the numerical tag, the second is the
329 (decoded) value that has been tagged.
330
331 The filter function should return either exactly one value, which
332 will replace the tagged value in the decoded data structure, or no
333 values, which will result in default handling, which currently
334 means the decoder creates a "CBOR::XS::Tagged" object to hold the
335 tag and the value.
336
337 When the filter is cleared (the default state), the default filter
338 function, "CBOR::XS::default_filter", is used. This function simply
339 looks up the tag in the %CBOR::XS::FILTER hash. If an entry exists
340 it must be a code reference that is called with tag and value, and
341 is responsible for decoding the value. If no entry exists, it
342 returns no values. "CBOR::XS" provides a number of default filter
343 functions already, the the %CBOR::XS::FILTER hash can be freely
344 extended with more.
345
346 "CBOR::XS" additionally provides an alternative filter function
347 that is supposed to be safe to use with untrusted data (which the
348 default filter might not), called "CBOR::XS::safe_filter", which
349 works the same as the "default_filter" but uses the
350 %CBOR::XS::SAFE_FILTER variable instead. It is prepopulated with
351 the tag decoding functions that are deemed safe (basically the same
352 as %CBOR::XS::FILTER without all the bignum tags), and can be
353 extended by user code as wlel, although, obviously, one should be
354 very careful about adding decoding functions here, since the
355 expectation is that they are safe to use on untrusted data, after
356 all.
357
358 Example: decode all tags not handled internally into
359 "CBOR::XS::Tagged" objects, with no other special handling (useful
360 when working with potentially "unsafe" CBOR data).
361
362 CBOR::XS->new->filter (sub { })->decode ($cbor_data);
363
364 Example: provide a global filter for tag 1347375694, converting the
365 value into some string form.
366
367 $CBOR::XS::FILTER{1347375694} = sub {
368 my ($tag, $value);
369
370 "tag 1347375694 value $value"
371 };
372
373 Example: provide your own filter function that looks up tags in
374 your own hash:
375
376 my %my_filter = (
377 998347484 => sub {
378 my ($tag, $value);
379
380 "tag 998347484 value $value"
381 };
382 );
383
384 my $coder = CBOR::XS->new->filter (sub {
385 &{ $my_filter{$_[0]} or return }
386 });
387
388 Example: use the safe filter function (see "SECURITY
389 CONSIDERATIONS" for more considerations on security).
390
391 CBOR::XS->new->filter (\&CBOR::XS::safe_filter)->decode ($cbor_data);
392
393 $cbor_data = $cbor->encode ($perl_scalar)
394 Converts the given Perl data structure (a scalar value) to its CBOR
395 representation.
396
397 $perl_scalar = $cbor->decode ($cbor_data)
398 The opposite of "encode": expects CBOR data and tries to parse it,
399 returning the resulting simple scalar or reference. Croaks on
400 error.
401
402 ($perl_scalar, $octets) = $cbor->decode_prefix ($cbor_data)
403 This works like the "decode" method, but instead of raising an
404 exception when there is trailing garbage after the CBOR string, it
405 will silently stop parsing there and return the number of
406 characters consumed so far.
407
408 This is useful if your CBOR texts are not delimited by an outer
409 protocol and you need to know where the first CBOR string ends amd
410 the next one starts - CBOR strings are self-delimited, so it is
411 possible to concatenate CBOR strings without any delimiters or size
412 fields and recover their data.
413
414 CBOR::XS->new->decode_prefix ("......")
415 => ("...", 3)
416
417 INCREMENTAL PARSING
418 In some cases, there is the need for incremental parsing of JSON texts.
419 While this module always has to keep both CBOR text and resulting Perl
420 data structure in memory at one time, it does allow you to parse a CBOR
421 stream incrementally, using a similar to using "decode_prefix" to see
422 if a full CBOR object is available, but is much more efficient.
423
424 It basically works by parsing as much of a CBOR string as possible - if
425 the CBOR data is not complete yet, the pasrer will remember where it
426 was, to be able to restart when more data has been accumulated. Once
427 enough data is available to either decode a complete CBOR value or
428 raise an error, a real decode will be attempted.
429
430 A typical use case would be a network protocol that consists of sending
431 and receiving CBOR-encoded messages. The solution that works with CBOR
432 and about anything else is by prepending a length to every CBOR value,
433 so the receiver knows how many octets to read. More compact (and
434 slightly slower) would be to just send CBOR values back-to-back, as
435 "CBOR::XS" knows where a CBOR value ends, and doesn't need an explicit
436 length.
437
438 The following methods help with this:
439
440 @decoded = $cbor->incr_parse ($buffer)
441 This method attempts to decode exactly one CBOR value from the
442 beginning of the given $buffer. The value is removed from the
443 $buffer on success. When $buffer doesn't contain a complete value
444 yet, it returns nothing. Finally, when the $buffer doesn't start
445 with something that could ever be a valid CBOR value, it raises an
446 exception, just as "decode" would. In the latter case the decoder
447 state is undefined and must be reset before being able to parse
448 further.
449
450 This method modifies the $buffer in place. When no CBOR value can
451 be decoded, the decoder stores the current string offset. On the
452 next call, continues decoding at the place where it stopped before.
453 For this to make sense, the $buffer must begin with the same octets
454 as on previous unsuccessful calls.
455
456 You can call this method in scalar context, in which case it either
457 returns a decoded value or "undef". This makes it impossible to
458 distinguish between CBOR null values (which decode to "undef") and
459 an unsuccessful decode, which is often acceptable.
460
461 @decoded = $cbor->incr_parse_multiple ($buffer)
462 Same as "incr_parse", but attempts to decode as many CBOR values as
463 possible in one go, instead of at most one. Calls to "incr_parse"
464 and "incr_parse_multiple" can be interleaved.
465
466 $cbor->incr_reset
467 Resets the incremental decoder. This throws away any saved state,
468 so that subsequent calls to "incr_parse" or "incr_parse_multiple"
469 start to parse a new CBOR value from the beginning of the $buffer
470 again.
471
472 This method can be called at any time, but it must be called if you
473 want to change your $buffer or there was a decoding error and you
474 want to reuse the $cbor object for future incremental parsings.
475
477 This section describes how CBOR::XS maps Perl values to CBOR values and
478 vice versa. These mappings are designed to "do the right thing" in most
479 circumstances automatically, preserving round-tripping characteristics
480 (what you put in comes out as something equivalent).
481
482 For the more enlightened: note that in the following descriptions,
483 lowercase perl refers to the Perl interpreter, while uppercase Perl
484 refers to the abstract Perl language itself.
485
486 CBOR -> PERL
487 integers
488 CBOR integers become (numeric) perl scalars. On perls without 64
489 bit support, 64 bit integers will be truncated or otherwise
490 corrupted.
491
492 byte strings
493 Byte strings will become octet strings in Perl (the Byte values
494 0..255 will simply become characters of the same value in Perl).
495
496 UTF-8 strings
497 UTF-8 strings in CBOR will be decoded, i.e. the UTF-8 octets will
498 be decoded into proper Unicode code points. At the moment, the
499 validity of the UTF-8 octets will not be validated - corrupt input
500 will result in corrupted Perl strings.
501
502 arrays, maps
503 CBOR arrays and CBOR maps will be converted into references to a
504 Perl array or hash, respectively. The keys of the map will be
505 stringified during this process.
506
507 null
508 CBOR null becomes "undef" in Perl.
509
510 true, false, undefined
511 These CBOR values become "Types:Serialiser::true",
512 "Types:Serialiser::false" and "Types::Serialiser::error",
513 respectively. They are overloaded to act almost exactly like the
514 numbers 1 and 0 (for true and false) or to throw an exception on
515 access (for error). See the Types::Serialiser manpage for details.
516
517 tagged values
518 Tagged items consists of a numeric tag and another CBOR value.
519
520 See "TAG HANDLING AND EXTENSIONS" and the description of "->filter"
521 for details on which tags are handled how.
522
523 anything else
524 Anything else (e.g. unsupported simple values) will raise a
525 decoding error.
526
527 PERL -> CBOR
528 The mapping from Perl to CBOR is slightly more difficult, as Perl is a
529 typeless language. That means this module can only guess which CBOR
530 type is meant by a perl value.
531
532 hash references
533 Perl hash references become CBOR maps. As there is no inherent
534 ordering in hash keys (or CBOR maps), they will usually be encoded
535 in a pseudo-random order. This order can be different each time a
536 hash is encoded.
537
538 Currently, tied hashes will use the indefinite-length format, while
539 normal hashes will use the fixed-length format.
540
541 array references
542 Perl array references become fixed-length CBOR arrays.
543
544 other references
545 Other unblessed references will be represented using the
546 indirection tag extension (tag value 22098,
547 <http://cbor.schmorp.de/indirection>). CBOR decoders are guaranteed
548 to be able to decode these values somehow, by either "doing the
549 right thing", decoding into a generic tagged object, simply
550 ignoring the tag, or something else.
551
552 CBOR::XS::Tagged objects
553 Objects of this type must be arrays consisting of a single "[tag,
554 value]" pair. The (numerical) tag will be encoded as a CBOR tag,
555 the value will be encoded as appropriate for the value. You must
556 use "CBOR::XS::tag" to create such objects.
557
558 Types::Serialiser::true, Types::Serialiser::false,
559 Types::Serialiser::error
560 These special values become CBOR true, CBOR false and CBOR
561 undefined values, respectively. You can also use "\1", "\0" and
562 "\undef" directly if you want.
563
564 other blessed objects
565 Other blessed objects are serialised via "TO_CBOR" or "FREEZE". See
566 "TAG HANDLING AND EXTENSIONS" for specific classes handled by this
567 module, and "OBJECT SERIALISATION" for generic object
568 serialisation.
569
570 simple scalars
571 Simple Perl scalars (any scalar that is not a reference) are the
572 most difficult objects to encode: CBOR::XS will encode undefined
573 scalars as CBOR null values, scalars that have last been used in a
574 string context before encoding as CBOR strings, and anything else
575 as number value:
576
577 # dump as number
578 encode_cbor [2] # yields [2]
579 encode_cbor [-3.0e17] # yields [-3e+17]
580 my $value = 5; encode_cbor [$value] # yields [5]
581
582 # used as string, so dump as string (either byte or text)
583 print $value;
584 encode_cbor [$value] # yields ["5"]
585
586 # undef becomes null
587 encode_cbor [undef] # yields [null]
588
589 You can force the type to be a CBOR string by stringifying it:
590
591 my $x = 3.1; # some variable containing a number
592 "$x"; # stringified
593 $x .= ""; # another, more awkward way to stringify
594 print $x; # perl does it for you, too, quite often
595
596 You can force whether a string is encoded as byte or text string by
597 using "utf8::upgrade" and "utf8::downgrade" (if "text_strings" is
598 disabled).
599
600 utf8::upgrade $x; # encode $x as text string
601 utf8::downgrade $x; # encode $x as byte string
602
603 More options are available, see "TYPE CASTS", below, and the
604 "text_keys" and "text_strings" options.
605
606 Perl doesn't define what operations up- and downgrade strings, so
607 if the difference between byte and text is important, you should
608 up- or downgrade your string as late as possible before encoding.
609 You can also force the use of CBOR text strings by using
610 "text_keys" or "text_strings".
611
612 You can force the type to be a CBOR number by numifying it:
613
614 my $x = "3"; # some variable containing a string
615 $x += 0; # numify it, ensuring it will be dumped as a number
616 $x *= 1; # same thing, the choice is yours.
617
618 You can not currently force the type in other, less obscure, ways.
619 Tell me if you need this capability (but don't forget to explain
620 why it's needed :).
621
622 Perl values that seem to be integers generally use the shortest
623 possible representation. Floating-point values will use either the
624 IEEE single format if possible without loss of precision, otherwise
625 the IEEE double format will be used. Perls that use formats other
626 than IEEE double to represent numerical values are supported, but
627 might suffer loss of precision.
628
629 TYPE CASTS
630 EXPERIMENTAL: As an experimental extension, "CBOR::XS" allows you to
631 force specific CBOR types to be used when encoding. That allows you to
632 encode types not normally accessible (e.g. half floats) as well as
633 force string types even when "text_strings" is in effect.
634
635 Type forcing is done by calling a special "cast" function which keeps a
636 copy of the value and returns a new value that can be handed over to
637 any CBOR encoder function.
638
639 The following casts are currently available (all of which are unary
640 operators, that is, have a prototype of "$"):
641
642 CBOR::XS::as_int $value
643 Forces the value to be encoded as some form of (basic, not bignum)
644 integer type.
645
646 CBOR::XS::as_text $value
647 Forces the value to be encoded as (UTF-8) text values.
648
649 CBOR::XS::as_bytes $value
650 Forces the value to be encoded as a (binary) string value.
651
652 Example: encode a perl string as binary even though "text_strings"
653 is in effect.
654
655 CBOR::XS->new->text_strings->encode ([4, "text", CBOR::XS::bytes "bytevalue"]);
656
657 CBOR::XS::as_bool $value
658 Converts a Perl boolean (which can be any kind of scalar) into a
659 CBOR boolean. Strictly the same, but shorter to write, than:
660
661 $value ? Types::Serialiser::true : Types::Serialiser::false
662
663 CBOR::XS::as_float16 $value
664 Forces half-float (IEEE 754 binary16) encoding of the given value.
665
666 CBOR::XS::as_float32 $value
667 Forces single-float (IEEE 754 binary32) encoding of the given
668 value.
669
670 CBOR::XS::as_float64 $value
671 Forces double-float (IEEE 754 binary64) encoding of the given
672 value.
673
674 CBOR::XS::as_cbor $cbor_text
675 Not a type cast per-se, this type cast forces the argument to be
676 encoded as-is. This can be used to embed pre-encoded CBOR data.
677
678 Note that no checking on the validity of the $cbor_text is done -
679 it's the callers responsibility to correctly encode values.
680
681 CBOR::XS::as_map [key => value...]
682 Treat the array reference as key value pairs and output a CBOR map.
683 This allows you to generate CBOR maps with arbitrary key types (or,
684 if you don't care about semantics, duplicate keys or pairs in a
685 custom order), which is otherwise hard to do with Perl.
686
687 The single argument must be an array reference with an even number
688 of elements.
689
690 Note that only the reference to the array is copied, the array
691 itself is not. Modifications done to the array before calling an
692 encoding function will be reflected in the encoded output.
693
694 Example: encode a CBOR map with a string and an integer as keys.
695
696 encode_cbor CBOR::XS::as_map [string => "value", 5 => "value"]
697
698 OBJECT SERIALISATION
699 This module implements both a CBOR-specific and the generic
700 Types::Serialier object serialisation protocol. The following
701 subsections explain both methods.
702
703 ENCODING
704
705 This module knows two way to serialise a Perl object: The CBOR-specific
706 way, and the generic way.
707
708 Whenever the encoder encounters a Perl object that it cannot serialise
709 directly (most of them), it will first look up the "TO_CBOR" method on
710 it.
711
712 If it has a "TO_CBOR" method, it will call it with the object as only
713 argument, and expects exactly one return value, which it will then
714 substitute and encode it in the place of the object.
715
716 Otherwise, it will look up the "FREEZE" method. If it exists, it will
717 call it with the object as first argument, and the constant string
718 "CBOR" as the second argument, to distinguish it from other
719 serialisers.
720
721 The "FREEZE" method can return any number of values (i.e. zero or
722 more). These will be encoded as CBOR perl object, together with the
723 classname.
724
725 These methods MUST NOT change the data structure that is being
726 serialised. Failure to comply to this can result in memory corruption -
727 and worse.
728
729 If an object supports neither "TO_CBOR" nor "FREEZE", encoding will
730 fail with an error.
731
732 DECODING
733
734 Objects encoded via "TO_CBOR" cannot (normally) be automatically
735 decoded, but objects encoded via "FREEZE" can be decoded using the
736 following protocol:
737
738 When an encoded CBOR perl object is encountered by the decoder, it will
739 look up the "THAW" method, by using the stored classname, and will fail
740 if the method cannot be found.
741
742 After the lookup it will call the "THAW" method with the stored
743 classname as first argument, the constant string "CBOR" as second
744 argument, and all values returned by "FREEZE" as remaining arguments.
745
746 EXAMPLES
747
748 Here is an example "TO_CBOR" method:
749
750 sub My::Object::TO_CBOR {
751 my ($obj) = @_;
752
753 ["this is a serialised My::Object object", $obj->{id}]
754 }
755
756 When a "My::Object" is encoded to CBOR, it will instead encode a simple
757 array with two members: a string, and the "object id". Decoding this
758 CBOR string will yield a normal perl array reference in place of the
759 object.
760
761 A more useful and practical example would be a serialisation method for
762 the URI module. CBOR has a custom tag value for URIs, namely 32:
763
764 sub URI::TO_CBOR {
765 my ($self) = @_;
766 my $uri = "$self"; # stringify uri
767 utf8::upgrade $uri; # make sure it will be encoded as UTF-8 string
768 CBOR::XS::tag 32, "$_[0]"
769 }
770
771 This will encode URIs as a UTF-8 string with tag 32, which indicates an
772 URI.
773
774 Decoding such an URI will not (currently) give you an URI object, but
775 instead a CBOR::XS::Tagged object with tag number 32 and the string -
776 exactly what was returned by "TO_CBOR".
777
778 To serialise an object so it can automatically be deserialised, you
779 need to use "FREEZE" and "THAW". To take the URI module as example,
780 this would be a possible implementation:
781
782 sub URI::FREEZE {
783 my ($self, $serialiser) = @_;
784 "$self" # encode url string
785 }
786
787 sub URI::THAW {
788 my ($class, $serialiser, $uri) = @_;
789 $class->new ($uri)
790 }
791
792 Unlike "TO_CBOR", multiple values can be returned by "FREEZE". For
793 example, a "FREEZE" method that returns "type", "id" and "variant"
794 values would cause an invocation of "THAW" with 5 arguments:
795
796 sub My::Object::FREEZE {
797 my ($self, $serialiser) = @_;
798
799 ($self->{type}, $self->{id}, $self->{variant})
800 }
801
802 sub My::Object::THAW {
803 my ($class, $serialiser, $type, $id, $variant) = @_;
804
805 $class-<new (type => $type, id => $id, variant => $variant)
806 }
807
809 There is no way to distinguish CBOR from other formats
810 programmatically. To make it easier to distinguish CBOR from other
811 formats, the CBOR specification has a special "magic string" that can
812 be prepended to any CBOR string without changing its meaning.
813
814 This string is available as $CBOR::XS::MAGIC. This module does not
815 prepend this string to the CBOR data it generates, but it will ignore
816 it if present, so users can prepend this string as a "file type"
817 indicator as required.
818
820 CBOR has the concept of tagged values - any CBOR value can be tagged
821 with a numeric 64 bit number, which are centrally administered.
822
823 "CBOR::XS" handles a few tags internally when en- or decoding. You can
824 also create tags yourself by encoding "CBOR::XS::Tagged" objects, and
825 the decoder will create "CBOR::XS::Tagged" objects itself when it hits
826 an unknown tag.
827
828 These objects are simply blessed array references - the first member of
829 the array being the numerical tag, the second being the value.
830
831 You can interact with "CBOR::XS::Tagged" objects in the following ways:
832
833 $tagged = CBOR::XS::tag $tag, $value
834 This function(!) creates a new "CBOR::XS::Tagged" object using the
835 given $tag (0..2**64-1) to tag the given $value (which can be any
836 Perl value that can be encoded in CBOR, including serialisable Perl
837 objects and "CBOR::XS::Tagged" objects).
838
839 $tagged->[0]
840 $tagged->[0] = $new_tag
841 $tag = $tagged->tag
842 $new_tag = $tagged->tag ($new_tag)
843 Access/mutate the tag.
844
845 $tagged->[1]
846 $tagged->[1] = $new_value
847 $value = $tagged->value
848 $new_value = $tagged->value ($new_value)
849 Access/mutate the tagged value.
850
851 EXAMPLES
852 Here are some examples of "CBOR::XS::Tagged" uses to tag objects.
853
854 You can look up CBOR tag value and emanings in the IANA registry at
855 <http://www.iana.org/assignments/cbor-tags/cbor-tags.xhtml>.
856
857 Prepend a magic header ($CBOR::XS::MAGIC):
858
859 my $cbor = encode_cbor CBOR::XS::tag 55799, $value;
860 # same as:
861 my $cbor = $CBOR::XS::MAGIC . encode_cbor $value;
862
863 Serialise some URIs and a regex in an array:
864
865 my $cbor = encode_cbor [
866 (CBOR::XS::tag 32, "http://www.nethype.de/"),
867 (CBOR::XS::tag 32, "http://software.schmorp.de/"),
868 (CBOR::XS::tag 35, "^[Pp][Ee][Rr][lL]\$"),
869 ];
870
871 Wrap CBOR data in CBOR:
872
873 my $cbor_cbor = encode_cbor
874 CBOR::XS::tag 24,
875 encode_cbor [1, 2, 3];
876
878 This section describes how this module handles specific tagged values
879 and extensions. If a tag is not mentioned here and no additional
880 filters are provided for it, then the default handling applies
881 (creating a CBOR::XS::Tagged object on decoding, and only encoding the
882 tag when explicitly requested).
883
884 Tags not handled specifically are currently converted into a
885 CBOR::XS::Tagged object, which is simply a blessed array reference
886 consisting of the numeric tag value followed by the (decoded) CBOR
887 value.
888
889 Future versions of this module reserve the right to special case
890 additional tags (such as base64url).
891
892 ENFORCED TAGS
893 These tags are always handled when decoding, and their handling cannot
894 be overridden by the user.
895
896 26 (perl-object, <http://cbor.schmorp.de/perl-object>)
897 These tags are automatically created (and decoded) for serialisable
898 objects using the "FREEZE/THAW" methods (the Types::Serialier
899 object serialisation protocol). See "OBJECT SERIALISATION" for
900 details.
901
902 28, 29 (shareable, sharedref, <http://cbor.schmorp.de/value-sharing>)
903 These tags are automatically decoded when encountered (and they do
904 not result in a cyclic data structure, see "allow_cycles"),
905 resulting in shared values in the decoded object. They are only
906 encoded, however, when "allow_sharing" is enabled.
907
908 Not all shared values can be successfully decoded: values that
909 reference themselves will currently decode as "undef" (this is not
910 the same as a reference pointing to itself, which will be
911 represented as a value that contains an indirect reference to
912 itself - these will be decoded properly).
913
914 Note that considerably more shared value data structures can be
915 decoded than will be encoded - currently, only values pointed to by
916 references will be shared, others will not. While non-reference
917 shared values can be generated in Perl with some effort, they were
918 considered too unimportant to be supported in the encoder. The
919 decoder, however, will decode these values as shared values.
920
921 256, 25 (stringref-namespace, stringref,
922 <http://cbor.schmorp.de/stringref>)
923 These tags are automatically decoded when encountered. They are
924 only encoded, however, when "pack_strings" is enabled.
925
926 22098 (indirection, <http://cbor.schmorp.de/indirection>)
927 This tag is automatically generated when a reference are
928 encountered (with the exception of hash and array references). It
929 is converted to a reference when decoding.
930
931 55799 (self-describe CBOR, RFC 7049)
932 This value is not generated on encoding (unless explicitly
933 requested by the user), and is simply ignored when decoding.
934
935 NON-ENFORCED TAGS
936 These tags have default filters provided when decoding. Their handling
937 can be overridden by changing the %CBOR::XS::FILTER entry for the tag,
938 or by providing a custom "filter" callback when decoding.
939
940 When they result in decoding into a specific Perl class, the module
941 usually provides a corresponding "TO_CBOR" method as well.
942
943 When any of these need to load additional modules that are not part of
944 the perl core distribution (e.g. URI), it is (currently) up to the user
945 to provide these modules. The decoding usually fails with an exception
946 if the required module cannot be loaded.
947
948 0, 1 (date/time string, seconds since the epoch)
949 These tags are decoded into Time::Piece objects. The corresponding
950 "Time::Piece::TO_CBOR" method always encodes into tag 1 values
951 currently.
952
953 The Time::Piece API is generally surprisingly bad, and fractional
954 seconds are only accidentally kept intact, so watch out. On the
955 plus side, the module comes with perl since 5.10, which has to
956 count for something.
957
958 2, 3 (positive/negative bignum)
959 These tags are decoded into Math::BigInt objects. The corresponding
960 "Math::BigInt::TO_CBOR" method encodes "small" bigints into normal
961 CBOR integers, and others into positive/negative CBOR bignums.
962
963 4, 5, 264, 265 (decimal fraction/bigfloat)
964 Both decimal fractions and bigfloats are decoded into
965 Math::BigFloat objects. The corresponding "Math::BigFloat::TO_CBOR"
966 method always encodes into a decimal fraction (either tag 4 or
967 264).
968
969 NaN and infinities are not encoded properly, as they cannot be
970 represented in CBOR.
971
972 See "BIGNUM SECURITY CONSIDERATIONS" for more info.
973
974 30 (rational numbers)
975 These tags are decoded into Math::BigRat objects. The corresponding
976 "Math::BigRat::TO_CBOR" method encodes rational numbers with
977 denominator 1 via their numerator only, i.e., they become normal
978 integers or "bignums".
979
980 See "BIGNUM SECURITY CONSIDERATIONS" for more info.
981
982 21, 22, 23 (expected later JSON conversion)
983 CBOR::XS is not a CBOR-to-JSON converter, and will simply ignore
984 these tags.
985
986 32 (URI)
987 These objects decode into URI objects. The corresponding
988 "URI::TO_CBOR" method again results in a CBOR URI value.
989
991 CBOR is supposed to implement a superset of the JSON data model, and
992 is, with some coercion, able to represent all JSON texts (something
993 that other "binary JSON" formats such as BSON generally do not
994 support).
995
996 CBOR implements some extra hints and support for JSON interoperability,
997 and the spec offers further guidance for conversion between CBOR and
998 JSON. None of this is currently implemented in CBOR, and the guidelines
999 in the spec do not result in correct round-tripping of data. If JSON
1000 interoperability is improved in the future, then the goal will be to
1001 ensure that decoded JSON data will round-trip encoding and decoding to
1002 CBOR intact.
1003
1005 Tl;dr... if you want to decode or encode CBOR from untrusted sources,
1006 you should start with a coder object created via "new_safe" (which
1007 implements the mitigations explained below):
1008
1009 my $coder = CBOR::XS->new_safe;
1010
1011 my $data = $coder->decode ($cbor_text);
1012 my $cbor = $coder->encode ($data);
1013
1014 Longer version: When you are using CBOR in a protocol, talking to
1015 untrusted potentially hostile creatures requires some thought:
1016
1017 Security of the CBOR decoder itself
1018 First and foremost, your CBOR decoder should be secure, that is,
1019 should not have any buffer overflows or similar bugs that could
1020 potentially be exploited. Obviously, this module should ensure that
1021 and I am trying hard on making that true, but you never know.
1022
1023 CBOR::XS can invoke almost arbitrary callbacks during decoding
1024 CBOR::XS supports object serialisation - decoding CBOR can cause
1025 calls to any "THAW" method in any package that exists in your
1026 process (that is, CBOR::XS will not try to load modules, but any
1027 existing "THAW" method or function can be called, so they all have
1028 to be secure).
1029
1030 Less obviously, it will also invoke "TO_CBOR" and "FREEZE" methods
1031 - even if all your "THAW" methods are secure, encoding data
1032 structures from untrusted sources can invoke those and trigger bugs
1033 in those.
1034
1035 So, if you are not sure about the security of all the modules you
1036 have loaded (you shouldn't), you should disable this part using
1037 "forbid_objects" or using "new_safe".
1038
1039 CBOR can be extended with tags that call library code
1040 CBOR can be extended with tags, and "CBOR::XS" has a registry of
1041 conversion functions for many existing tags that can be extended
1042 via third-party modules (see the "filter" method).
1043
1044 If you don't trust these, you should configure the "safe" filter
1045 function, "CBOR::XS::safe_filter" ("new_safe" does this), which by
1046 default only includes conversion functions that are considered
1047 "safe" by the author (but again, they can be extended by third
1048 party modules).
1049
1050 Depending on your level of paranoia, you can use the "safe" filter:
1051
1052 $cbor->filter (\&CBOR::XS::safe_filter);
1053
1054 ... your own filter...
1055
1056 $cbor->filter (sub { ... do your stuffs here ... });
1057
1058 ... or even no filter at all, disabling all tag decoding:
1059
1060 $cbor->filter (sub { });
1061
1062 This is never a problem for encoding, as the tag mechanism only
1063 exists in CBOR texts.
1064
1065 Resource-starving attacks: object memory usage
1066 You need to avoid resource-starving attacks. That means you should
1067 limit the size of CBOR data you accept, or make sure then when your
1068 resources run out, that's just fine (e.g. by using a separate
1069 process that can crash safely). The size of a CBOR string in octets
1070 is usually a good indication of the size of the resources required
1071 to decode it into a Perl structure. While CBOR::XS can check the
1072 size of the CBOR text (using "max_size" - done by "new_safe"), it
1073 might be too late when you already have it in memory, so you might
1074 want to check the size before you accept the string.
1075
1076 As for encoding, it is possible to construct data structures that
1077 are relatively small but result in large CBOR texts (for example by
1078 having an array full of references to the same big data structure,
1079 which will all be deep-cloned during encoding by default). This is
1080 rarely an actual issue (and the worst case is still just running
1081 out of memory), but you can reduce this risk by using
1082 "allow_sharing".
1083
1084 Resource-starving attacks: stack overflows
1085 CBOR::XS recurses using the C stack when decoding objects and
1086 arrays. The C stack is a limited resource: for instance, on my
1087 amd64 machine with 8MB of stack size I can decode around 180k
1088 nested arrays but only 14k nested CBOR objects (due to perl itself
1089 recursing deeply on croak to free the temporary). If that is
1090 exceeded, the program crashes. To be conservative, the default
1091 nesting limit is set to 512. If your process has a smaller stack,
1092 you should adjust this setting accordingly with the "max_depth"
1093 method.
1094
1095 Resource-starving attacks: CPU en-/decoding complexity
1096 CBOR::XS will use the Math::BigInt, Math::BigFloat and Math::BigRat
1097 libraries to represent encode/decode bignums. These can be very
1098 slow (as in, centuries of CPU time) and can even crash your program
1099 (and are generally not very trustworthy). See the next section on
1100 bignum security for details.
1101
1102 Data breaches: leaking information in error messages
1103 CBOR::XS might leak contents of your Perl data structures in its
1104 error messages, so when you serialise sensitive information you
1105 might want to make sure that exceptions thrown by CBOR::XS will not
1106 end up in front of untrusted eyes.
1107
1108 Something else...
1109 Something else could bomb you, too, that I forgot to think of. In
1110 that case, you get to keep the pieces. I am always open for hints,
1111 though...
1112
1114 CBOR::XS provides a "TO_CBOR" method for both Math::BigInt and
1115 Math::BigFloat that tries to encode the number in the simplest possible
1116 way, that is, either a CBOR integer, a CBOR bigint/decimal fraction
1117 (tag 4) or an arbitrary-exponent decimal fraction (tag 264). Rational
1118 numbers (Math::BigRat, tag 30) can also contain bignums as members.
1119
1120 CBOR::XS will also understand base-2 bigfloat or arbitrary-exponent
1121 bigfloats (tags 5 and 265), but it will never generate these on its
1122 own.
1123
1124 Using the built-in Math::BigInt::Calc support, encoding and decoding
1125 decimal fractions is generally fast. Decoding bigints can be slow for
1126 very big numbers (tens of thousands of digits, something that could
1127 potentially be caught by limiting the size of CBOR texts), and decoding
1128 bigfloats or arbitrary-exponent bigfloats can be extremely slow
1129 (minutes, decades) for large exponents (roughly 40 bit and longer).
1130
1131 Additionally, Math::BigInt can take advantage of other bignum
1132 libraries, such as Math::GMP, which cannot handle big floats with large
1133 exponents, and might simply abort or crash your program, due to their
1134 code quality.
1135
1136 This can be a concern if you want to parse untrusted CBOR. If it is,
1137 you might want to disable decoding of tag 2 (bigint) and 3 (negative
1138 bigint) types. You should also disable types 5 and 265, as these can be
1139 slow even without bigints.
1140
1141 Disabling bigints will also partially or fully disable types that rely
1142 on them, e.g. rational numbers that use bignums.
1143
1145 This section contains some random implementation notes. They do not
1146 describe guaranteed behaviour, but merely behaviour as-is implemented
1147 right now.
1148
1149 64 bit integers are only properly decoded when Perl was built with 64
1150 bit support.
1151
1152 Strings and arrays are encoded with a definite length. Hashes as well,
1153 unless they are tied (or otherwise magical).
1154
1155 Only the double data type is supported for NV data types - when Perl
1156 uses long double to represent floating point values, they might not be
1157 encoded properly. Half precision types are accepted, but not encoded.
1158
1159 Strict mode and canonical mode are not implemented.
1160
1162 On perls that were built without 64 bit integer support (these are rare
1163 nowadays, even on 32 bit architectures, as all major Perl distributions
1164 are built with 64 bit integer support), support for any kind of 64 bit
1165 value in CBOR is very limited - most likely, these 64 bit values will
1166 be truncated, corrupted, or otherwise not decoded correctly. This also
1167 includes string, float, array and map sizes that are stored as 64 bit
1168 integers.
1169
1171 This module is not guaranteed to be thread safe and there are no plans
1172 to change this until Perl gets thread support (as opposed to the
1173 horribly slow so-called "threads" which are simply slow and bloated
1174 process simulations - use fork, it's much faster, cheaper, better).
1175
1176 (It might actually work, but you have been warned).
1177
1179 While the goal of this module is to be correct, that unfortunately does
1180 not mean it's bug-free, only that I think its design is bug-free. If
1181 you keep reporting bugs they will be fixed swiftly, though.
1182
1183 Please refrain from using rt.cpan.org or any other bug reporting
1184 service. I put the contact address into my modules for a reason.
1185
1187 The JSON and JSON::XS modules that do similar, but human-readable,
1188 serialisation.
1189
1190 The Types::Serialiser module provides the data model for true, false
1191 and error values.
1192
1194 Marc Lehmann <schmorp@schmorp.de>
1195 http://home.schmorp.de/
1196
1197
1198
1199perl v5.36.0 2023-01-20 XS(3)