1XS(3) User Contributed Perl Documentation XS(3)
2
3
4
6 Cpanel::JSON::XS - cPanel fork of JSON::XS, fast and correct
7 serializing
8
10 use Cpanel::JSON::XS;
11
12 # exported functions, they croak on error
13 # and expect/generate UTF-8
14
15 $utf8_encoded_json_text = encode_json $perl_hash_or_arrayref;
16 $perl_hash_or_arrayref = decode_json $utf8_encoded_json_text;
17
18 # OO-interface
19
20 $coder = Cpanel::JSON::XS->new->ascii->pretty->allow_nonref;
21 $pretty_printed_unencoded = $coder->encode ($perl_scalar);
22 $perl_scalar = $coder->decode ($unicode_json_text);
23
24 # Note that 5.6 misses most smart utf8 and encoding functionalities
25 # of newer releases.
26
27 # Note that L<JSON::MaybeXS> will automatically use Cpanel::JSON::XS
28 # if available, at virtually no speed overhead either, so you should
29 # be able to just:
30
31 use JSON::MaybeXS;
32
33 # and do the same things, except that you have a pure-perl fallback now.
34
36 This module converts Perl data structures to JSON and vice versa. Its
37 primary goal is to be correct and its secondary goal is to be fast. To
38 reach the latter goal it was written in C.
39
40 As this is the n-th-something JSON module on CPAN, what was the reason
41 to write yet another JSON module? While it seems there are many JSON
42 modules, none of them correctly handle all corner cases, and in most
43 cases their maintainers are unresponsive, gone missing, or not
44 listening to bug reports for other reasons.
45
46 See below for the cPanel fork.
47
48 See MAPPING, below, on how Cpanel::JSON::XS maps perl values to JSON
49 values and vice versa.
50
51 FEATURES
52 · correct Unicode handling
53
54 This module knows how to handle Unicode with Perl version higher
55 than 5.8.5, documents how and when it does so, and even documents
56 what "correct" means.
57
58 · round-trip integrity
59
60 When you serialize a perl data structure using only data types
61 supported by JSON and Perl, the deserialized data structure is
62 identical on the Perl level. (e.g. the string "2.0" doesn't
63 suddenly become "2" just because it looks like a number). There are
64 minor exceptions to this, read the MAPPING section below to learn
65 about those.
66
67 · strict checking of JSON correctness
68
69 There is no guessing, no generating of illegal JSON texts by
70 default, and only JSON is accepted as input by default. the latter
71 is a security feature.
72
73 · fast
74
75 Compared to other JSON modules and other serializers such as
76 Storable, this module usually compares favourably in terms of
77 speed, too.
78
79 · simple to use
80
81 This module has both a simple functional interface as well as an
82 object oriented interface.
83
84 · reasonably versatile output formats
85
86 You can choose between the most compact guaranteed-single-line
87 format possible (nice for simple line-based protocols), a pure-
88 ASCII format (for when your transport is not 8-bit clean, still
89 supports the whole Unicode range), or a pretty-printed format (for
90 when you want to read that stuff). Or you can combine those
91 features in whatever way you like.
92
93 cPanel fork
94 Since the original author MLEHMANN has no public bugtracker, this
95 cPanel fork sits now on github.
96
97 src repo: <https://github.com/rurban/Cpanel-JSON-XS> original:
98 <http://cvs.schmorp.de/JSON-XS/>
99
100 RT: <https://github.com/rurban/Cpanel-JSON-XS/issues> or
101 <https://rt.cpan.org/Public/Dist/Display.html?Queue=Cpanel-JSON-XS>
102
103 Changes to JSON::XS
104
105 - stricter decode_json() as documented. non-refs are disallowed.
106 added a 2nd optional argument. decode() honors now allow_nonref.
107
108 - fixed encode of numbers for dual-vars. Different string
109 representations are preserved, but numbers with temporary strings
110 which represent the same number are here treated as numbers, not
111 strings. Cpanel::JSON::XS is a bit slower, but preserves numeric
112 types better.
113
114 - numbers ending with .0 stray numbers, are not converted to
115 integers. [#63] dual-vars which are represented as number not
116 integer (42+"bar" != 5.8.9) are now encoded as number (=> 42.0)
117 because internally it's now a NOK type. However !!1 which is
118 wrongly encoded in 5.8 as "1"/1.0 is still represented as integer.
119
120 - different handling of inf/nan. Default now to null, optionally with
121 stringify_infnan() to "inf"/"nan". [#28, #32]
122
123 - added "binary" extension, non-JSON and non JSON parsable, allows
124 "\xNN" and "\NNN" sequences.
125
126 - 5.6.2 support; sacrificing some utf8 features (assuming bytes
127 all-over), no multi-byte unicode characters with 5.6.
128
129 - interop for true/false overloading. JSON::XS, JSON::PP and Mojo::JSON
130 representations for booleans are accepted and JSON::XS accepts
131 Cpanel::JSON::XS booleans [#13, #37]
132 Fixed overloading of booleans. Cpanel::JSON::XS::true stringifies
133 again
134 to "1", not "true", analog to all other JSON modules.
135
136 - native boolean mapping of yes and no to true and false, as in
137 YAML::XS.
138 In perl "!0" is yes, "!1" is no.
139 The JSON value true maps to 1, false maps to 0. [#39]
140
141 - support arbitrary stringification with encode, with convert_blessed
142 and allow_blessed.
143
144 - ithread support. Cpanel::JSON::XS is thread-safe, JSON::XS not
145
146 - is_bool can be called as method, JSON::XS::is_bool not.
147
148 - performance optimizations for threaded Perls
149
150 - relaxed mode, allowing many popular extensions
151
152 - additional fixes for:
153
154 - [cpan #88061] AIX atof without USE_LONG_DOUBLE
155
156 - #10 unshare_hek crash
157
158 - #7, #29 avoid re-blessing where possible. It fails in JSON::XS for
159 READONLY values, i.e. restricted hashes.
160
161 - #41 overloading of booleans, use the object not the reference.
162
163 - #62 -Dusequadmath conversion and no SEGV.
164
165 - #72 parsing of values followed \0, like 1\0 does fail.
166
167 - #72 parsing of illegal unicode or non-unicode characters.
168
169 - #96 locale-insensitive numeric conversion
170
171 - public maintenance and bugtracker
172
173 - use ppport.h, sanify XS.xs comment styles, harness C coding style
174
175 - common::sense is optional. When available it is not used in the
176 published production module, just during development and testing.
177
178 - extended testsuite, passes all http://seriot.ch/parsing_json.html
179 tests. In fact it is the only know JSON decoder which does so,
180 while also being the fastest.
181
182 - support many more options and methods from JSON::PP:
183 stringify_infnan, allow_unknown, allow_stringify, allow_barekey,
184 encode_stringify, allow_bignum, allow_singlequote, sort_by
185 (partially), escape_slash, convert_blessed, ... optional
186 decode_json(, allow_nonref) arg.
187 relaxed implements allow_dupkeys.
188
189 - support all 5 unicode BOM's: UTF-8, UTF-16LE, UTF-16BE, UTF-32LE,
190 UTF-32BE, encoding internally to UTF-8.
191
193 The following convenience methods are provided by this module. They are
194 exported by default:
195
196 $json_text = encode_json $perl_scalar, [json_type]
197 Converts the given Perl data structure to a UTF-8 encoded, binary
198 string (that is, the string contains octets only). Croaks on error.
199
200 This function call is functionally identical to:
201
202 $json_text = Cpanel::JSON::XS->new->utf8->encode ($perl_scalar, $json_type)
203
204 Except being faster.
205
206 For the type argument see Cpanel::JSON::XS::Type.
207
208 $perl_scalar = decode_json $json_text [, $allow_nonref [, my $json_type
209 ] ]
210 The opposite of "encode_json": expects an UTF-8 (binary) string of
211 an json reference and tries to parse that as an UTF-8 encoded JSON
212 text, returning the resulting reference. Croaks on error.
213
214 This function call is functionally identical to:
215
216 $perl_scalar = Cpanel::JSON::XS->new->utf8->decode ($json_text, $json_type)
217
218 except being faster.
219
220 Note that older decode_json versions in Cpanel::JSON::XS older than
221 3.0116 and JSON::XS did not set allow_nonref but allowed them due
222 to a bug in the decoder.
223
224 If the new optional $allow_nonref argument is set and not false,
225 the allow_nonref option will be set and the function will act is
226 described as in the relaxed RFC 7159 allowing all values such as
227 objects, arrays, strings, numbers, "null", "true", and "false".
228
229 For the type argument see Cpanel::JSON::XS::Type.
230
231 $is_boolean = Cpanel::JSON::XS::is_bool $scalar
232 Returns true if the passed scalar represents either
233 "JSON::XS::true" or "JSON::XS::false", two constants that act like
234 1 and 0, respectively and are used to represent JSON "true" and
235 "false" values in Perl.
236
237 See MAPPING, below, for more information on how JSON values are
238 mapped to Perl.
239
241 from_json
242 from_json has been renamed to decode_json
243
244 to_json
245 to_json has been renamed to encode_json
246
248 Since this often leads to confusion, here are a few very clear words on
249 how Unicode works in Perl, modulo bugs.
250
251 1. Perl strings can store characters with ordinal values > 255.
252 This enables you to store Unicode characters as single characters
253 in a Perl string - very natural.
254
255 2. Perl does not associate an encoding with your strings.
256 ... until you force it to, e.g. when matching it against a regex,
257 or printing the scalar to a file, in which case Perl either
258 interprets your string as locale-encoded text, octets/binary, or as
259 Unicode, depending on various settings. In no case is an encoding
260 stored together with your data, it is use that decides encoding,
261 not any magical meta data.
262
263 3. The internal utf-8 flag has no meaning with regards to the encoding
264 of your string.
265 4. A "Unicode String" is simply a string where each character can be
266 validly interpreted as a Unicode code point.
267 If you have UTF-8 encoded data, it is no longer a Unicode string,
268 but a Unicode string encoded in UTF-8, giving you a binary string.
269
270 5. A string containing "high" (> 255) character values is not a UTF-8
271 string.
272 6. Unicode noncharacters only warn, as in core.
273 The 66 Unicode noncharacters U+FDD0..U+FDEF, and U+*FFFE, U+*FFFF
274 just warn, see <http://www.unicode.org/versions/corrigendum9.html>.
275 But illegal surrogate pairs fail to parse.
276
277 7. Raw non-Unicode characters above U+10FFFF are disallowed.
278 Raw non-Unicode characters outside the valid unicode range fail to
279 parse, because "A string is a sequence of zero or more Unicode
280 characters" RFC 7159 section 1 and "JSON text SHALL be encoded in
281 Unicode RFC 7159 section 8.1. We use now the UTF8_DISALLOW_SUPER
282 flag when parsing unicode.
283
284 I hope this helps :)
285
287 The object oriented interface lets you configure your own encoding or
288 decoding style, within the limits of supported formats.
289
290 $json = new Cpanel::JSON::XS
291 Creates a new JSON object that can be used to de/encode JSON
292 strings. All boolean flags described below are by default disabled.
293
294 The mutators for flags all return the JSON object again and thus
295 calls can be chained:
296
297 my $json = Cpanel::JSON::XS->new->utf8->space_after->encode ({a => [1,2]})
298 => {"a": [1, 2]}
299
300 $json = $json->ascii ([$enable])
301 $enabled = $json->get_ascii
302 If $enable is true (or missing), then the "encode" method will not
303 generate characters outside the code range 0..127 (which is ASCII).
304 Any Unicode characters outside that range will be escaped using
305 either a single "\uXXXX" (BMP characters) or a double
306 "\uHHHH\uLLLLL" escape sequence, as per RFC4627. The resulting
307 encoded JSON text can be treated as a native Unicode string, an
308 ascii-encoded, latin1-encoded or UTF-8 encoded string, or any other
309 superset of ASCII.
310
311 If $enable is false, then the "encode" method will not escape
312 Unicode characters unless required by the JSON syntax or other
313 flags. This results in a faster and more compact format.
314
315 See also the section ENCODING/CODESET FLAG NOTES later in this
316 document.
317
318 The main use for this flag is to produce JSON texts that can be
319 transmitted over a 7-bit channel, as the encoded JSON texts will
320 not contain any 8 bit characters.
321
322 Cpanel::JSON::XS->new->ascii (1)->encode ([chr 0x10401])
323 => ["\ud801\udc01"]
324
325 $json = $json->latin1 ([$enable])
326 $enabled = $json->get_latin1
327 If $enable is true (or missing), then the "encode" method will
328 encode the resulting JSON text as latin1 (or ISO-8859-1), escaping
329 any characters outside the code range 0..255. The resulting string
330 can be treated as a latin1-encoded JSON text or a native Unicode
331 string. The "decode" method will not be affected in any way by this
332 flag, as "decode" by default expects Unicode, which is a strict
333 superset of latin1.
334
335 If $enable is false, then the "encode" method will not escape
336 Unicode characters unless required by the JSON syntax or other
337 flags.
338
339 See also the section ENCODING/CODESET FLAG NOTES later in this
340 document.
341
342 The main use for this flag is efficiently encoding binary data as
343 JSON text, as most octets will not be escaped, resulting in a
344 smaller encoded size. The disadvantage is that the resulting JSON
345 text is encoded in latin1 (and must correctly be treated as such
346 when storing and transferring), a rare encoding for JSON. It is
347 therefore most useful when you want to store data structures known
348 to contain binary data efficiently in files or databases, not when
349 talking to other JSON encoders/decoders.
350
351 Cpanel::JSON::XS->new->latin1->encode (["\x{89}\x{abc}"]
352 => ["\x{89}\\u0abc"] # (perl syntax, U+abc escaped, U+89 not)
353
354 $json = $json->binary ([$enable])
355 $enabled = $json = $json->get_binary
356 If the $enable argument is true (or missing), then the "encode"
357 method will not try to detect an UTF-8 encoding in any JSON string,
358 it will strictly interpret it as byte sequence. The result might
359 contain new "\xNN" sequences, which is unparsable JSON. The
360 "decode" method forbids "\uNNNN" sequences and accepts "\xNN" and
361 octal "\NNN" sequences.
362
363 There is also a special logic for perl 5.6 and utf8. 5.6 encodes
364 any string to utf-8 automatically when seeing a codepoint >= 0x80
365 and < 0x100. With the binary flag enabled decode the perl utf8
366 encoded string to the original byte encoding and encode this with
367 "\xNN" escapes. This will result to the same encodings as with
368 newer perls. But note that binary multi-byte codepoints with 5.6
369 will result in "illegal unicode character in binary string" errors,
370 unlike with newer perls.
371
372 If $enable is false, then the "encode" method will smartly try to
373 detect Unicode characters unless required by the JSON syntax or
374 other flags and hex and octal sequences are forbidden.
375
376 See also the section ENCODING/CODESET FLAG NOTES later in this
377 document.
378
379 The main use for this flag is to avoid the smart unicode detection
380 and possible double encoding. The disadvantage is that the
381 resulting JSON text is encoded in new "\xNN" and in latin1
382 characters and must correctly be treated as such when storing and
383 transferring, a rare encoding for JSON. It will produce non-
384 readable JSON strings in the browser. It is therefore most useful
385 when you want to store data structures known to contain binary data
386 efficiently in files or databases, not when talking to other JSON
387 encoders/decoders. The binary decoding method can also be used
388 when an encoder produced a non-JSON conformant hex or octal
389 encoding "\xNN" or "\NNN".
390
391 Cpanel::JSON::XS->new->binary->encode (["\x{89}\x{abc}"])
392 5.6: Error: malformed or illegal unicode character in binary string
393 >=5.8: ['\x89\xe0\xaa\xbc']
394
395 Cpanel::JSON::XS->new->binary->encode (["\x{89}\x{bc}"])
396 => ["\x89\xbc"]
397
398 Cpanel::JSON::XS->new->binary->decode (["\x89\ua001"])
399 Error: malformed or illegal unicode character in binary string
400
401 Cpanel::JSON::XS->new->decode (["\x89"])
402 Error: illegal hex character in non-binary string
403
404 $json = $json->utf8 ([$enable])
405 $enabled = $json->get_utf8
406 If $enable is true (or missing), then the "encode" method will
407 encode the JSON result into UTF-8, as required by many protocols,
408 while the "decode" method expects to be handled an UTF-8-encoded
409 string. Please note that UTF-8-encoded strings do not contain any
410 characters outside the range 0..255, they are thus useful for
411 bytewise/binary I/O. In future versions, enabling this option might
412 enable autodetection of the UTF-16 and UTF-32 encoding families, as
413 described in RFC4627.
414
415 If $enable is false, then the "encode" method will return the JSON
416 string as a (non-encoded) Unicode string, while "decode" expects
417 thus a Unicode string. Any decoding or encoding (e.g. to UTF-8 or
418 UTF-16) needs to be done yourself, e.g. using the Encode module.
419
420 See also the section ENCODING/CODESET FLAG NOTES later in this
421 document.
422
423 Example, output UTF-16BE-encoded JSON:
424
425 use Encode;
426 $jsontext = encode "UTF-16BE", Cpanel::JSON::XS->new->encode ($object);
427
428 Example, decode UTF-32LE-encoded JSON:
429
430 use Encode;
431 $object = Cpanel::JSON::XS->new->decode (decode "UTF-32LE", $jsontext);
432
433 $json = $json->pretty ([$enable])
434 This enables (or disables) all of the "indent", "space_before" and
435 "space_after" (and in the future possibly more) flags in one call
436 to generate the most readable (or most compact) form possible.
437
438 Example, pretty-print some simple structure:
439
440 my $json = Cpanel::JSON::XS->new->pretty(1)->encode ({a => [1,2]})
441 =>
442 {
443 "a" : [
444 1,
445 2
446 ]
447 }
448
449 $json = $json->indent ([$enable])
450 $enabled = $json->get_indent
451 If $enable is true (or missing), then the "encode" method will use
452 a multiline format as output, putting every array member or
453 object/hash key-value pair into its own line, indenting them
454 properly.
455
456 If $enable is false, no newlines or indenting will be produced, and
457 the resulting JSON text is guaranteed not to contain any
458 "newlines".
459
460 This setting has no effect when decoding JSON texts.
461
462 $json = $json->indent_length([$number_of_spaces])
463 $length = $json->get_indent_length()
464 Set the indent length (default 3). This option is only useful when
465 you also enable indent or pretty. The acceptable range is from 0
466 (no indentation) to 15
467
468 $json = $json->space_before ([$enable])
469 $enabled = $json->get_space_before
470 If $enable is true (or missing), then the "encode" method will add
471 an extra optional space before the ":" separating keys from values
472 in JSON objects.
473
474 If $enable is false, then the "encode" method will not add any
475 extra space at those places.
476
477 This setting has no effect when decoding JSON texts. You will also
478 most likely combine this setting with "space_after".
479
480 Example, space_before enabled, space_after and indent disabled:
481
482 {"key" :"value"}
483
484 $json = $json->space_after ([$enable])
485 $enabled = $json->get_space_after
486 If $enable is true (or missing), then the "encode" method will add
487 an extra optional space after the ":" separating keys from values
488 in JSON objects and extra whitespace after the "," separating key-
489 value pairs and array members.
490
491 If $enable is false, then the "encode" method will not add any
492 extra space at those places.
493
494 This setting has no effect when decoding JSON texts.
495
496 Example, space_before and indent disabled, space_after enabled:
497
498 {"key": "value"}
499
500 $json = $json->relaxed ([$enable])
501 $enabled = $json->get_relaxed
502 If $enable is true (or missing), then "decode" will accept some
503 extensions to normal JSON syntax (see below). "encode" will not be
504 affected in anyway. Be aware that this option makes you accept
505 invalid JSON texts as if they were valid!. I suggest only to use
506 this option to parse application-specific files written by humans
507 (configuration files, resource files etc.)
508
509 If $enable is false (the default), then "decode" will only accept
510 valid JSON texts.
511
512 Currently accepted extensions are:
513
514 · list items can have an end-comma
515
516 JSON separates array elements and key-value pairs with commas.
517 This can be annoying if you write JSON texts manually and want
518 to be able to quickly append elements, so this extension
519 accepts comma at the end of such items not just between them:
520
521 [
522 1,
523 2, <- this comma not normally allowed
524 ]
525 {
526 "k1": "v1",
527 "k2": "v2", <- this comma not normally allowed
528 }
529
530 · shell-style '#'-comments
531
532 Whenever JSON allows whitespace, shell-style comments are
533 additionally allowed. They are terminated by the first
534 carriage-return or line-feed character, after which more white-
535 space and comments are allowed.
536
537 [
538 1, # this comment not allowed in JSON
539 # neither this one...
540 ]
541
542 · literal ASCII TAB characters in strings
543
544 Literal ASCII TAB characters are now allowed in strings (and
545 treated as "\t") in relaxed mode. Despite JSON mandates, that
546 TAB character is substituted for "\t" sequence.
547
548 [
549 "Hello\tWorld",
550 "Hello<TAB>World", # literal <TAB> would not normally be allowed
551 ]
552
553 · allow_singlequote
554
555 Single quotes are accepted instead of double quotes. See the
556 "allow_singlequote" option.
557
558 { "foo":'bar' }
559 { 'foo':"bar" }
560 { 'foo':'bar' }
561
562 · allow_barekey
563
564 Accept unquoted object keys instead of with mandatory double
565 quotes. See the "allow_barekey" option.
566
567 { foo:"bar" }
568
569 · allow_dupkeys
570
571 Allow decoding of duplicate keys in hashes. By default
572 duplicate keys are forbidden. See
573 <http://seriot.ch/parsing_json.php#24>: RFC 7159 section 4:
574 "The names within an object should be unique." See the
575 "allow_dupkeys" option.
576
577 $json = $json->canonical ([$enable])
578 $enabled = $json->get_canonical
579 If $enable is true (or missing), then the "encode" method will
580 output JSON objects by sorting their keys. This is adding a
581 comparatively high overhead.
582
583 If $enable is false, then the "encode" method will output key-value
584 pairs in the order Perl stores them (which will likely change
585 between runs of the same script, and can change even within the
586 same run from 5.18 onwards).
587
588 This option is useful if you want the same data structure to be
589 encoded as the same JSON text (given the same overall settings). If
590 it is disabled, the same hash might be encoded differently even if
591 contains the same data, as key-value pairs have no inherent
592 ordering in Perl.
593
594 This setting has no effect when decoding JSON texts.
595
596 This setting has currently no effect on tied hashes.
597
598 $json = $json->sort_by (undef, 0, 1 or a block)
599 This currently only (un)sets the "canonical" option, and ignores
600 custom sort blocks.
601
602 This setting has no effect when decoding JSON texts.
603
604 This setting has currently no effect on tied hashes.
605
606 $json = $json->escape_slash ([$enable])
607 $enabled = $json->get_escape_slash
608 According to the JSON Grammar, the forward slash character (U+002F)
609 "/" need to be escaped. But by default strings are encoded without
610 escaping slashes in all perl JSON encoders.
611
612 If $enable is true (or missing), then "encode" will escape slashes,
613 "\/".
614
615 This setting has no effect when decoding JSON texts.
616
617 $json = $json->unblessed_bool ([$enable])
618 $enabled = $json->get_unblessed_bool
619 $json = $json->unblessed_bool([$enable])
620
621 If $enable is true (or missing), then "decode" will return Perl
622 non-object boolean variables (1 and 0) for JSON booleans ("true"
623 and "false"). If $enable is false, then "decode" will return
624 "Cpanel::JSON::XS::Boolean" objects for JSON booleans.
625
626 $json = $json->allow_singlequote ([$enable])
627 $enabled = $json->get_allow_singlequote
628 $json = $json->allow_singlequote([$enable])
629
630 If $enable is true (or missing), then "decode" will accept JSON
631 strings quoted by single quotations that are invalid JSON format.
632
633 $json->allow_singlequote->decode({"foo":'bar'});
634 $json->allow_singlequote->decode({'foo':"bar"});
635 $json->allow_singlequote->decode({'foo':'bar'});
636
637 This is also enabled with "relaxed". As same as the "relaxed"
638 option, this option may be used to parse application-specific files
639 written by humans.
640
641 $json = $json->allow_barekey ([$enable])
642 $enabled = $json->get_allow_barekey
643 $json = $json->allow_barekey([$enable])
644
645 If $enable is true (or missing), then "decode" will accept bare
646 keys of JSON object that are invalid JSON format.
647
648 Same as with the "relaxed" option, this option may be used to parse
649 application-specific files written by humans.
650
651 $json->allow_barekey->decode('{foo:"bar"}');
652
653 $json = $json->allow_bignum ([$enable])
654 $enabled = $json->get_allow_bignum
655 $json = $json->allow_bignum([$enable])
656
657 If $enable is true (or missing), then "decode" will convert the big
658 integer Perl cannot handle as integer into a Math::BigInt object
659 and convert a floating number (any) into a Math::BigFloat.
660
661 On the contrary, "encode" converts "Math::BigInt" objects and
662 "Math::BigFloat" objects into JSON numbers with "allow_blessed"
663 enable.
664
665 $json->allow_nonref->allow_blessed->allow_bignum;
666 $bigfloat = $json->decode('2.000000000000000000000000001');
667 print $json->encode($bigfloat);
668 # => 2.000000000000000000000000001
669
670 See "MAPPING" about the normal conversion of JSON number.
671
672 $json = $json->allow_bigint ([$enable])
673 This option is obsolete and replaced by allow_bignum.
674
675 $json = $json->allow_nonref ([$enable])
676 $enabled = $json->get_allow_nonref
677 If $enable is true (or missing), then the "encode" method can
678 convert a non-reference into its corresponding string, number or
679 null JSON value, which is an extension to RFC4627. Likewise,
680 "decode" will accept those JSON values instead of croaking.
681
682 If $enable is false, then the "encode" method will croak if it
683 isn't passed an arrayref or hashref, as JSON texts must either be
684 an object or array. Likewise, "decode" will croak if given
685 something that is not a JSON object or array.
686
687 Example, encode a Perl scalar as JSON value with enabled
688 "allow_nonref", resulting in an invalid JSON text:
689
690 Cpanel::JSON::XS->new->allow_nonref->encode ("Hello, World!")
691 => "Hello, World!"
692
693 $json = $json->allow_unknown ([$enable])
694 $enabled = $json->get_allow_unknown
695 If $enable is true (or missing), then "encode" will not throw an
696 exception when it encounters values it cannot represent in JSON
697 (for example, filehandles) but instead will encode a JSON "null"
698 value. Note that blessed objects are not included here and are
699 handled separately by c<allow_nonref>.
700
701 If $enable is false (the default), then "encode" will throw an
702 exception when it encounters anything it cannot encode as JSON.
703
704 This option does not affect "decode" in any way, and it is
705 recommended to leave it off unless you know your communications
706 partner.
707
708 $json = $json->allow_stringify ([$enable])
709 $enabled = $json->get_allow_stringify
710 If $enable is true (or missing), then "encode" will stringify the
711 non-object perl value or reference. Note that blessed objects are
712 not included here and are handled separately by "allow_blessed" and
713 "convert_blessed". String references are stringified to the string
714 value, other references as in perl.
715
716 This option does not affect "decode" in any way.
717
718 This option is special to this module, it is not supported by other
719 encoders. So it is not recommended to use it.
720
721 $json = $json->require_types ([$enable])
722 $enable = $json->get_require_types
723 $json = $json->require_types([$enable])
724
725 If $enable is true (or missing), then "encode" will require second
726 argument with supplied JSON types. See Cpanel::JSON::XS::Type.
727 When second argument is not provided (or is undef), then "encode"
728 croaks. It also croaks when the type for provided structure in
729 "encode" is incomplete.
730
731 $json = $json->allow_dupkeys ([$enable])
732 $enabled = $json->get_allow_dupkeys
733 If $enable is true (or missing), then the "decode" method will not
734 die when it encounters duplicate keys in a hash. "allow_dupkeys"
735 is also enabled in the "relaxed" mode.
736
737 The JSON spec allows duplicate name in objects but recommends to
738 disable it, however with Perl hashes they are impossible, parsing
739 JSON in Perl silently ignores duplicate names, using the last value
740 found.
741
742 See <http://seriot.ch/parsing_json.php#24>: RFC 7159 section 4:
743 "The names within an object should be unique."
744
745 $json = $json->allow_blessed ([$enable])
746 $enabled = $json->get_allow_blessed
747 If $enable is true (or missing), then the "encode" method will not
748 barf when it encounters a blessed reference. Instead, the value of
749 the convert_blessed option will decide whether "null"
750 ("convert_blessed" disabled or no "TO_JSON" method found) or a
751 representation of the object ("convert_blessed" enabled and
752 "TO_JSON" method found) is being encoded. Has no effect on
753 "decode".
754
755 If $enable is false (the default), then "encode" will throw an
756 exception when it encounters a blessed object.
757
758 This setting has no effect on "decode".
759
760 $json = $json->convert_blessed ([$enable])
761 $enabled = $json->get_convert_blessed
762 If $enable is true (or missing), then "encode", upon encountering a
763 blessed object, will check for the availability of the "TO_JSON"
764 method on the object's class. If found, it will be called in scalar
765 context and the resulting scalar will be encoded instead of the
766 object. If no "TO_JSON" method is found, a stringification overload
767 method is tried next. If both are not found, the value of
768 "allow_blessed" will decide what to do.
769
770 The "TO_JSON" method may safely call die if it wants. If "TO_JSON"
771 returns other blessed objects, those will be handled in the same
772 way. "TO_JSON" must take care of not causing an endless recursion
773 cycle (== crash) in this case. The same care must be taken with
774 calling encode in stringify overloads (even if this works by luck
775 in older perls) or other callbacks. The name of "TO_JSON" was
776 chosen because other methods called by the Perl core (== not by the
777 user of the object) are usually in upper case letters and to avoid
778 collisions with any "to_json" function or method.
779
780 If $enable is false (the default), then "encode" will not consider
781 this type of conversion.
782
783 This setting has no effect on "decode".
784
785 $json = $json->allow_tags ([$enable])
786 $enabled = $json->get_allow_tags
787 See "OBJECT SERIALIZATION" for details.
788
789 If $enable is true (or missing), then "encode", upon encountering a
790 blessed object, will check for the availability of the "FREEZE"
791 method on the object's class. If found, it will be used to
792 serialize the object into a nonstandard tagged JSON value (that
793 JSON decoders cannot decode).
794
795 It also causes "decode" to parse such tagged JSON values and
796 deserialize them via a call to the "THAW" method.
797
798 If $enable is false (the default), then "encode" will not consider
799 this type of conversion, and tagged JSON values will cause a parse
800 error in "decode", as if tags were not part of the grammar.
801
802 $json = $json->filter_json_object ([$coderef->($hashref)])
803 When $coderef is specified, it will be called from "decode" each
804 time it decodes a JSON object. The only argument is a reference to
805 the newly-created hash. If the code references returns a single
806 scalar (which need not be a reference), this value (i.e. a copy of
807 that scalar to avoid aliasing) is inserted into the deserialized
808 data structure. If it returns an empty list (NOTE: not "undef",
809 which is a valid scalar), the original deserialized hash will be
810 inserted. This setting can slow down decoding considerably.
811
812 When $coderef is omitted or undefined, any existing callback will
813 be removed and "decode" will not change the deserialized hash in
814 any way.
815
816 Example, convert all JSON objects into the integer 5:
817
818 my $js = Cpanel::JSON::XS->new->filter_json_object (sub { 5 });
819 # returns [5]
820 $js->decode ('[{}]')
821 # throw an exception because allow_nonref is not enabled
822 # so a lone 5 is not allowed.
823 $js->decode ('{"a":1, "b":2}');
824
825 $json = $json->filter_json_single_key_object ($key [=>
826 $coderef->($value)])
827 Works remotely similar to "filter_json_object", but is only called
828 for JSON objects having a single key named $key.
829
830 This $coderef is called before the one specified via
831 "filter_json_object", if any. It gets passed the single value in
832 the JSON object. If it returns a single value, it will be inserted
833 into the data structure. If it returns nothing (not even "undef"
834 but the empty list), the callback from "filter_json_object" will be
835 called next, as if no single-key callback were specified.
836
837 If $coderef is omitted or undefined, the corresponding callback
838 will be disabled. There can only ever be one callback for a given
839 key.
840
841 As this callback gets called less often then the
842 "filter_json_object" one, decoding speed will not usually suffer as
843 much. Therefore, single-key objects make excellent targets to
844 serialize Perl objects into, especially as single-key JSON objects
845 are as close to the type-tagged value concept as JSON gets (it's
846 basically an ID/VALUE tuple). Of course, JSON does not support this
847 in any way, so you need to make sure your data never looks like a
848 serialized Perl hash.
849
850 Typical names for the single object key are "__class_whatever__",
851 or "$__dollars_are_rarely_used__$" or "}ugly_brace_placement", or
852 even things like "__class_md5sum(classname)__", to reduce the risk
853 of clashing with real hashes.
854
855 Example, decode JSON objects of the form "{ "__widget__" => <id> }"
856 into the corresponding $WIDGET{<id>} object:
857
858 # return whatever is in $WIDGET{5}:
859 Cpanel::JSON::XS
860 ->new
861 ->filter_json_single_key_object (__widget__ => sub {
862 $WIDGET{ $_[0] }
863 })
864 ->decode ('{"__widget__": 5')
865
866 # this can be used with a TO_JSON method in some "widget" class
867 # for serialization to json:
868 sub WidgetBase::TO_JSON {
869 my ($self) = @_;
870
871 unless ($self->{id}) {
872 $self->{id} = ..get..some..id..;
873 $WIDGET{$self->{id}} = $self;
874 }
875
876 { __widget__ => $self->{id} }
877 }
878
879 $json = $json->shrink ([$enable])
880 $enabled = $json->get_shrink
881 Perl usually over-allocates memory a bit when allocating space for
882 strings. This flag optionally resizes strings generated by either
883 "encode" or "decode" to their minimum size possible. This can save
884 memory when your JSON texts are either very very long or you have
885 many short strings. It will also try to downgrade any strings to
886 octet-form if possible: perl stores strings internally either in an
887 encoding called UTF-X or in octet-form. The latter cannot store
888 everything but uses less space in general (and some buggy Perl or C
889 code might even rely on that internal representation being used).
890
891 The actual definition of what shrink does might change in future
892 versions, but it will always try to save space at the expense of
893 time.
894
895 If $enable is true (or missing), the string returned by "encode"
896 will be shrunk-to-fit, while all strings generated by "decode" will
897 also be shrunk-to-fit.
898
899 If $enable is false, then the normal perl allocation algorithms are
900 used. If you work with your data, then this is likely to be
901 faster.
902
903 In the future, this setting might control other things, such as
904 converting strings that look like integers or floats into integers
905 or floats internally (there is no difference on the Perl level),
906 saving space.
907
908 $json = $json->max_depth ([$maximum_nesting_depth])
909 $max_depth = $json->get_max_depth
910 Sets the maximum nesting level (default 512) accepted while
911 encoding or decoding. If a higher nesting level is detected in JSON
912 text or a Perl data structure, then the encoder and decoder will
913 stop and croak at that point.
914
915 Nesting level is defined by number of hash- or arrayrefs that the
916 encoder needs to traverse to reach a given point or the number of
917 "{" or "[" characters without their matching closing parenthesis
918 crossed to reach a given character in a string.
919
920 Setting the maximum depth to one disallows any nesting, so that
921 ensures that the object is only a single hash/object or array.
922
923 If no argument is given, the highest possible setting will be used,
924 which is rarely useful.
925
926 Note that nesting is implemented by recursion in C. The default
927 value has been chosen to be as large as typical operating systems
928 allow without crashing.
929
930 See SECURITY CONSIDERATIONS, below, for more info on why this is
931 useful.
932
933 $json = $json->max_size ([$maximum_string_size])
934 $max_size = $json->get_max_size
935 Set the maximum length a JSON text may have (in bytes) where
936 decoding is being attempted. The default is 0, meaning no limit.
937 When "decode" is called on a string that is longer then this many
938 bytes, it will not attempt to decode the string but throw an
939 exception. This setting has no effect on "encode" (yet).
940
941 If no argument is given, the limit check will be deactivated (same
942 as when 0 is specified).
943
944 See "SECURITY CONSIDERATIONS", below, for more info on why this is
945 useful.
946
947 $json->stringify_infnan ([$infnan_mode = 1])
948 $infnan_mode = $json->get_stringify_infnan
949 Get or set how Cpanel::JSON::XS encodes "inf", "-inf" or "nan" for
950 numeric values. Also qnan, snan or negative nan on some platforms.
951
952 "null": infnan_mode = 0. Similar to most JSON modules in other
953 languages. Always null.
954
955 stringified: infnan_mode = 1. As in Mojo::JSON. Platform specific
956 strings. Stringified via sprintf(%g), with double quotes.
957
958 inf/nan: infnan_mode = 2. As in JSON::XS, and older releases.
959 Passes through platform dependent values, invalid JSON. Stringified
960 via sprintf(%g), but without double quotes.
961
962 "inf/-inf/nan": infnan_mode = 3. Platform independent inf/nan/-inf
963 strings. No QNAN/SNAN/negative NAN support, unified to "nan". Much
964 easier to detect, but may conflict with valid strings.
965
966 $json_text = $json->encode ($perl_scalar, $json_type)
967 Converts the given Perl data structure (a simple scalar or a
968 reference to a hash or array) to its JSON representation. Simple
969 scalars will be converted into JSON string or number sequences,
970 while references to arrays become JSON arrays and references to
971 hashes become JSON objects. Undefined Perl values (e.g. "undef")
972 become JSON "null" values. Neither "true" nor "false" values will
973 be generated.
974
975 For the type argument see Cpanel::JSON::XS::Type.
976
977 $perl_scalar = $json->decode ($json_text, my $json_type)
978 The opposite of "encode": expects a JSON text and tries to parse
979 it, returning the resulting simple scalar or reference. Croaks on
980 error.
981
982 JSON numbers and strings become simple Perl scalars. JSON arrays
983 become Perl arrayrefs and JSON objects become Perl hashrefs. "true"
984 becomes 1, "false" becomes 0 and "null" becomes "undef".
985
986 For the type argument see Cpanel::JSON::XS::Type.
987
988 ($perl_scalar, $characters) = $json->decode_prefix ($json_text)
989 This works like the "decode" method, but instead of raising an
990 exception when there is trailing garbage after the first JSON
991 object, it will silently stop parsing there and return the number
992 of characters consumed so far.
993
994 This is useful if your JSON texts are not delimited by an outer
995 protocol and you need to know where the JSON text ends.
996
997 Cpanel::JSON::XS->new->decode_prefix ("[1] the tail")
998 => ([1], 3)
999
1000 $json->to_json ($perl_hash_or_arrayref)
1001 Deprecated method for perl 5.8 and newer. Use encode_json instead.
1002
1003 $json->from_json ($utf8_encoded_json_text)
1004 Deprecated method for perl 5.8 and newer. Use decode_json instead.
1005
1007 In some cases, there is the need for incremental parsing of JSON texts.
1008 While this module always has to keep both JSON text and resulting Perl
1009 data structure in memory at one time, it does allow you to parse a JSON
1010 stream incrementally. It does so by accumulating text until it has a
1011 full JSON object, which it then can decode. This process is similar to
1012 using "decode_prefix" to see if a full JSON object is available, but is
1013 much more efficient (and can be implemented with a minimum of method
1014 calls).
1015
1016 Cpanel::JSON::XS will only attempt to parse the JSON text once it is
1017 sure it has enough text to get a decisive result, using a very simple
1018 but truly incremental parser. This means that it sometimes won't stop
1019 as early as the full parser, for example, it doesn't detect mismatched
1020 parentheses. The only thing it guarantees is that it starts decoding as
1021 soon as a syntactically valid JSON text has been seen. This means you
1022 need to set resource limits (e.g. "max_size") to ensure the parser will
1023 stop parsing in the presence if syntax errors.
1024
1025 The following methods implement this incremental parser.
1026
1027 [void, scalar or list context] = $json->incr_parse ([$string])
1028 This is the central parsing function. It can both append new text
1029 and extract objects from the stream accumulated so far (both of
1030 these functions are optional).
1031
1032 If $string is given, then this string is appended to the already
1033 existing JSON fragment stored in the $json object.
1034
1035 After that, if the function is called in void context, it will
1036 simply return without doing anything further. This can be used to
1037 add more text in as many chunks as you want.
1038
1039 If the method is called in scalar context, then it will try to
1040 extract exactly one JSON object. If that is successful, it will
1041 return this object, otherwise it will return "undef". If there is a
1042 parse error, this method will croak just as "decode" would do (one
1043 can then use "incr_skip" to skip the erroneous part). This is the
1044 most common way of using the method.
1045
1046 And finally, in list context, it will try to extract as many
1047 objects from the stream as it can find and return them, or the
1048 empty list otherwise. For this to work, there must be no separators
1049 between the JSON objects or arrays, instead they must be
1050 concatenated back-to-back. If an error occurs, an exception will be
1051 raised as in the scalar context case. Note that in this case, any
1052 previously-parsed JSON texts will be lost.
1053
1054 Example: Parse some JSON arrays/objects in a given string and
1055 return them.
1056
1057 my @objs = Cpanel::JSON::XS->new->incr_parse ("[5][7][1,2]");
1058
1059 $lvalue_string = $json->incr_text (>5.8 only)
1060 This method returns the currently stored JSON fragment as an
1061 lvalue, that is, you can manipulate it. This only works when a
1062 preceding call to "incr_parse" in scalar context successfully
1063 returned an object, and 2. only with Perl >= 5.8
1064
1065 Under all other circumstances you must not call this function (I
1066 mean it. although in simple tests it might actually work, it will
1067 fail under real world conditions). As a special exception, you can
1068 also call this method before having parsed anything.
1069
1070 This function is useful in two cases: a) finding the trailing text
1071 after a JSON object or b) parsing multiple JSON objects separated
1072 by non-JSON text (such as commas).
1073
1074 $json->incr_skip
1075 This will reset the state of the incremental parser and will remove
1076 the parsed text from the input buffer so far. This is useful after
1077 "incr_parse" died, in which case the input buffer and incremental
1078 parser state is left unchanged, to skip the text parsed so far and
1079 to reset the parse state.
1080
1081 The difference to "incr_reset" is that only text until the parse
1082 error occurred is removed.
1083
1084 $json->incr_reset
1085 This completely resets the incremental parser, that is, after this
1086 call, it will be as if the parser had never parsed anything.
1087
1088 This is useful if you want to repeatedly parse JSON objects and
1089 want to ignore any trailing data, which means you have to reset the
1090 parser after each successful decode.
1091
1092 LIMITATIONS
1093 All options that affect decoding are supported, except "allow_nonref".
1094 The reason for this is that it cannot be made to work sensibly: JSON
1095 objects and arrays are self-delimited, i.e. you can concatenate them
1096 back to back and still decode them perfectly. This does not hold true
1097 for JSON numbers, however.
1098
1099 For example, is the string 1 a single JSON number, or is it simply the
1100 start of 12? Or is 12 a single JSON number, or the concatenation of 1
1101 and 2? In neither case you can tell, and this is why Cpanel::JSON::XS
1102 takes the conservative route and disallows this case.
1103
1104 EXAMPLES
1105 Some examples will make all this clearer. First, a simple example that
1106 works similarly to "decode_prefix": We want to decode the JSON object
1107 at the start of a string and identify the portion after the JSON
1108 object:
1109
1110 my $text = "[1,2,3] hello";
1111
1112 my $json = new Cpanel::JSON::XS;
1113
1114 my $obj = $json->incr_parse ($text)
1115 or die "expected JSON object or array at beginning of string";
1116
1117 my $tail = $json->incr_text;
1118 # $tail now contains " hello"
1119
1120 Easy, isn't it?
1121
1122 Now for a more complicated example: Imagine a hypothetical protocol
1123 where you read some requests from a TCP stream, and each request is a
1124 JSON array, without any separation between them (in fact, it is often
1125 useful to use newlines as "separators", as these get interpreted as
1126 whitespace at the start of the JSON text, which makes it possible to
1127 test said protocol with "telnet"...).
1128
1129 Here is how you'd do it (it is trivial to write this in an event-based
1130 manner):
1131
1132 my $json = new Cpanel::JSON::XS;
1133
1134 # read some data from the socket
1135 while (sysread $socket, my $buf, 4096) {
1136
1137 # split and decode as many requests as possible
1138 for my $request ($json->incr_parse ($buf)) {
1139 # act on the $request
1140 }
1141 }
1142
1143 Another complicated example: Assume you have a string with JSON objects
1144 or arrays, all separated by (optional) comma characters (e.g. "[1],[2],
1145 [3]"). To parse them, we have to skip the commas between the JSON
1146 texts, and here is where the lvalue-ness of "incr_text" comes in
1147 useful:
1148
1149 my $text = "[1],[2], [3]";
1150 my $json = new Cpanel::JSON::XS;
1151
1152 # void context, so no parsing done
1153 $json->incr_parse ($text);
1154
1155 # now extract as many objects as possible. note the
1156 # use of scalar context so incr_text can be called.
1157 while (my $obj = $json->incr_parse) {
1158 # do something with $obj
1159
1160 # now skip the optional comma
1161 $json->incr_text =~ s/^ \s* , //x;
1162 }
1163
1164 Now lets go for a very complex example: Assume that you have a gigantic
1165 JSON array-of-objects, many gigabytes in size, and you want to parse
1166 it, but you cannot load it into memory fully (this has actually
1167 happened in the real world :).
1168
1169 Well, you lost, you have to implement your own JSON parser. But
1170 Cpanel::JSON::XS can still help you: You implement a (very simple)
1171 array parser and let JSON decode the array elements, which are all full
1172 JSON objects on their own (this wouldn't work if the array elements
1173 could be JSON numbers, for example):
1174
1175 my $json = new Cpanel::JSON::XS;
1176
1177 # open the monster
1178 open my $fh, "<bigfile.json"
1179 or die "bigfile: $!";
1180
1181 # first parse the initial "["
1182 for (;;) {
1183 sysread $fh, my $buf, 65536
1184 or die "read error: $!";
1185 $json->incr_parse ($buf); # void context, so no parsing
1186
1187 # Exit the loop once we found and removed(!) the initial "[".
1188 # In essence, we are (ab-)using the $json object as a simple scalar
1189 # we append data to.
1190 last if $json->incr_text =~ s/^ \s* \[ //x;
1191 }
1192
1193 # now we have the skipped the initial "[", so continue
1194 # parsing all the elements.
1195 for (;;) {
1196 # in this loop we read data until we got a single JSON object
1197 for (;;) {
1198 if (my $obj = $json->incr_parse) {
1199 # do something with $obj
1200 last;
1201 }
1202
1203 # add more data
1204 sysread $fh, my $buf, 65536
1205 or die "read error: $!";
1206 $json->incr_parse ($buf); # void context, so no parsing
1207 }
1208
1209 # in this loop we read data until we either found and parsed the
1210 # separating "," between elements, or the final "]"
1211 for (;;) {
1212 # first skip whitespace
1213 $json->incr_text =~ s/^\s*//;
1214
1215 # if we find "]", we are done
1216 if ($json->incr_text =~ s/^\]//) {
1217 print "finished.\n";
1218 exit;
1219 }
1220
1221 # if we find ",", we can continue with the next element
1222 if ($json->incr_text =~ s/^,//) {
1223 last;
1224 }
1225
1226 # if we find anything else, we have a parse error!
1227 if (length $json->incr_text) {
1228 die "parse error near ", $json->incr_text;
1229 }
1230
1231 # else add more data
1232 sysread $fh, my $buf, 65536
1233 or die "read error: $!";
1234 $json->incr_parse ($buf); # void context, so no parsing
1235 }
1236
1237 This is a complex example, but most of the complexity comes from the
1238 fact that we are trying to be correct (bear with me if I am wrong, I
1239 never ran the above example :).
1240
1242 Detect all unicode Byte Order Marks on decode. Which are UTF-8,
1243 UTF-16LE, UTF-16BE, UTF-32LE and UTF-32BE.
1244
1245 The BOM encoding is set only for one specific decode call, it does not
1246 change the state of the JSON object.
1247
1248 Warning: With perls older than 5.20 you need load the Encode module
1249 before loading a multibyte BOM, i.e. >= UTF-16. Otherwise an error is
1250 thrown. This is an implementation limitation and might get fixed later.
1251
1252 See <https://tools.ietf.org/html/rfc7159#section-8.1> "JSON text SHALL
1253 be encoded in UTF-8, UTF-16, or UTF-32."
1254
1255 "Implementations MUST NOT add a byte order mark to the beginning of a
1256 JSON text", "implementations (...) MAY ignore the presence of a byte
1257 order mark rather than treating it as an error".
1258
1259 See also <http://www.unicode.org/faq/utf_bom.html#BOM>.
1260
1261 Beware that Cpanel::JSON::XS is currently the only JSON module which
1262 does accept and decode a BOM.
1263
1264 The latest JSON spec
1265 <https://www.greenbytes.de/tech/webdav/rfc8259.html#character.encoding>
1266 forbid the usage of UTF-16 or UTF-32, the character encoding is UTF-8.
1267 Thus in subsequent updates BOM's of UTF-16 or UTF-32 will throw an
1268 error.
1269
1271 This section describes how Cpanel::JSON::XS maps Perl values to JSON
1272 values and vice versa. These mappings are designed to "do the right
1273 thing" in most circumstances automatically, preserving round-tripping
1274 characteristics (what you put in comes out as something equivalent).
1275
1276 For the more enlightened: note that in the following descriptions,
1277 lowercase perl refers to the Perl interpreter, while uppercase Perl
1278 refers to the abstract Perl language itself.
1279
1280 JSON -> PERL
1281 object
1282 A JSON object becomes a reference to a hash in Perl. No ordering of
1283 object keys is preserved (JSON does not preserve object key
1284 ordering itself).
1285
1286 array
1287 A JSON array becomes a reference to an array in Perl.
1288
1289 string
1290 A JSON string becomes a string scalar in Perl - Unicode codepoints
1291 in JSON are represented by the same codepoints in the Perl string,
1292 so no manual decoding is necessary.
1293
1294 number
1295 A JSON number becomes either an integer, numeric (floating point)
1296 or string scalar in perl, depending on its range and any fractional
1297 parts. On the Perl level, there is no difference between those as
1298 Perl handles all the conversion details, but an integer may take
1299 slightly less memory and might represent more values exactly than
1300 floating point numbers.
1301
1302 If the number consists of digits only, Cpanel::JSON::XS will try to
1303 represent it as an integer value. If that fails, it will try to
1304 represent it as a numeric (floating point) value if that is
1305 possible without loss of precision. Otherwise it will preserve the
1306 number as a string value (in which case you lose roundtripping
1307 ability, as the JSON number will be re-encoded to a JSON string).
1308
1309 Numbers containing a fractional or exponential part will always be
1310 represented as numeric (floating point) values, possibly at a loss
1311 of precision (in which case you might lose perfect roundtripping
1312 ability, but the JSON number will still be re-encoded as a JSON
1313 number).
1314
1315 Note that precision is not accuracy - binary floating point values
1316 cannot represent most decimal fractions exactly, and when
1317 converting from and to floating point, "Cpanel::JSON::XS" only
1318 guarantees precision up to but not including the least significant
1319 bit.
1320
1321 true, false
1322 When "unblessed_bool" is set to true, then JSON "true" becomes 1
1323 and JSON "false" becomes 0.
1324
1325 Otherwise these JSON atoms become "Cpanel::JSON::XS::true" and
1326 "Cpanel::JSON::XS::false", respectively. They are
1327 "JSON::PP::Boolean" objects and are overloaded to act almost
1328 exactly like the numbers 1 and 0. You can check whether a scalar is
1329 a JSON boolean by using the "Cpanel::JSON::XS::is_bool" function.
1330
1331 The other round, from perl to JSON, "!0" which is represented as
1332 "yes" becomes "true", and "!1" which is represented as "no" becomes
1333 "false".
1334
1335 Via Cpanel::JSON::XS::Type you can now even force negation in
1336 "encode", without overloading of "!":
1337
1338 my $false = Cpanel::JSON::XS::false;
1339 print($json->encode([!$false], [JSON_TYPE_BOOL]));
1340 => [true]
1341
1342 null
1343 A JSON null atom becomes "undef" in Perl.
1344
1345 shell-style comments ("# text")
1346 As a nonstandard extension to the JSON syntax that is enabled by
1347 the "relaxed" setting, shell-style comments are allowed. They can
1348 start anywhere outside strings and go till the end of the line.
1349
1350 tagged values ("(tag)value").
1351 Another nonstandard extension to the JSON syntax, enabled with the
1352 "allow_tags" setting, are tagged values. In this implementation,
1353 the tag must be a perl package/class name encoded as a JSON string,
1354 and the value must be a JSON array encoding optional constructor
1355 arguments.
1356
1357 See "OBJECT SERIALIZATION", below, for details.
1358
1359 PERL -> JSON
1360 The mapping from Perl to JSON is slightly more difficult, as Perl is a
1361 truly typeless language, so we can only guess which JSON type is meant
1362 by a Perl value.
1363
1364 hash references
1365 Perl hash references become JSON objects. As there is no inherent
1366 ordering in hash keys (or JSON objects), they will usually be
1367 encoded in a pseudo-random order that can change between runs of
1368 the same program but stays generally the same within a single run
1369 of a program. Cpanel::JSON::XS can optionally sort the hash keys
1370 (determined by the canonical flag), so the same datastructure will
1371 serialize to the same JSON text (given same settings and version of
1372 Cpanel::JSON::XS), but this incurs a runtime overhead and is only
1373 rarely useful, e.g. when you want to compare some JSON text against
1374 another for equality.
1375
1376 array references
1377 Perl array references become JSON arrays.
1378
1379 other references
1380 Other unblessed references are generally not allowed and will cause
1381 an exception to be thrown, except for references to the integers 0
1382 and 1, which get turned into "false" and "true" atoms in JSON.
1383
1384 With the option "allow_stringify", you can ignore the exception and
1385 return the stringification of the perl value.
1386
1387 With the option "allow_unknown", you can ignore the exception and
1388 return "null" instead.
1389
1390 encode_json [\"x"] # => cannot encode reference to scalar 'SCALAR(0x..)'
1391 # unless the scalar is 0 or 1
1392 encode_json [\0, \1] # yields [false,true]
1393
1394 allow_stringify->encode_json [\"x"] # yields "x" unlike JSON::PP
1395 allow_unknown->encode_json [\"x"] # yields null as in JSON::PP
1396
1397 Cpanel::JSON::XS::true, Cpanel::JSON::XS::false
1398 These special values become JSON true and JSON false values,
1399 respectively. You can also use "\1" and "\0" or "!0" and "!1"
1400 directly if you want.
1401
1402 encode_json [Cpanel::JSON::XS::false, Cpanel::JSON::XS::true] # yields [false,true]
1403 encode_json [!1, !0] # yields [false,true]
1404
1405 eq/ne comparisons with true, false:
1406
1407 false is eq to the empty string or the string 'false' or the
1408 special empty string "!!0", i.e. "SV_NO", or the numbers 0 or 0.0.
1409
1410 true is eq to the string 'true' or to the special string "!0" (i.e.
1411 "SV_YES") or to the numbers 1 or 1.0.
1412
1413 blessed objects
1414 Blessed objects are not directly representable in JSON, but
1415 "Cpanel::JSON::XS" allows various optional ways of handling
1416 objects. See "OBJECT SERIALIZATION", below, for details.
1417
1418 See the "allow_blessed" and "convert_blessed" methods on various
1419 options on how to deal with this: basically, you can choose between
1420 throwing an exception, encoding the reference as if it weren't
1421 blessed, use the objects overloaded stringification method or
1422 provide your own serializer method.
1423
1424 simple scalars
1425 Simple Perl scalars (any scalar that is not a reference) are the
1426 most difficult objects to encode: Cpanel::JSON::XS will encode
1427 undefined scalars or inf/nan as JSON "null" values and other
1428 scalars to either number or string in non-deterministic way which
1429 may be affected or changed by Perl version or any other loaded Perl
1430 module.
1431
1432 If you want to have stable and deterministic types in JSON encoder
1433 then use Cpanel::JSON::XS::Type.
1434
1435 Non-deterministic behavior is following: scalars that have last
1436 been used in a string context before encoding as JSON strings, and
1437 anything else as number value:
1438
1439 # dump as number
1440 encode_json [2] # yields [2]
1441 encode_json [-3.0e17] # yields [-3e+17]
1442 my $value = 5; encode_json [$value] # yields [5]
1443
1444 # used as string, but the two representations are for the same number
1445 print $value;
1446 encode_json [$value] # yields [5]
1447
1448 # used as different string (non-matching dual-var)
1449 my $str = '0 but true';
1450 my $num = 1 + $str;
1451 encode_json [$num, $str] # yields [1,"0 but true"]
1452
1453 # undef becomes null
1454 encode_json [undef] # yields [null]
1455
1456 # inf or nan becomes null, unless you answered
1457 # "Do you want to handle inf/nan as strings" with yes
1458 encode_json [9**9**9] # yields [null]
1459
1460 You can force the type to be a JSON string by stringifying it:
1461
1462 my $x = 3.1; # some variable containing a number
1463 "$x"; # stringified
1464 $x .= ""; # another, more awkward way to stringify
1465 print $x; # perl does it for you, too, quite often
1466
1467 You can force the type to be a JSON number by numifying it:
1468
1469 my $x = "3"; # some variable containing a string
1470 $x += 0; # numify it, ensuring it will be dumped as a number
1471 $x *= 1; # same thing, the choice is yours.
1472
1473 Note that numerical precision has the same meaning as under Perl
1474 (so binary to decimal conversion follows the same rules as in Perl,
1475 which can differ to other languages). Also, your perl interpreter
1476 might expose extensions to the floating point numbers of your
1477 platform, such as infinities or NaN's - these cannot be represented
1478 in JSON, and thus null is returned instead. Optionally you can
1479 configure it to stringify inf and nan values.
1480
1481 OBJECT SERIALIZATION
1482 As JSON cannot directly represent Perl objects, you have to choose
1483 between a pure JSON representation (without the ability to deserialize
1484 the object automatically again), and a nonstandard extension to the
1485 JSON syntax, tagged values.
1486
1487 SERIALIZATION
1488
1489 What happens when "Cpanel::JSON::XS" encounters a Perl object depends
1490 on the "allow_blessed", "convert_blessed" and "allow_tags" settings,
1491 which are used in this order:
1492
1493 1. "allow_tags" is enabled and the object has a "FREEZE" method.
1494 In this case, "Cpanel::JSON::XS" uses the Types::Serialiser object
1495 serialization protocol to create a tagged JSON value, using a
1496 nonstandard extension to the JSON syntax.
1497
1498 This works by invoking the "FREEZE" method on the object, with the
1499 first argument being the object to serialize, and the second
1500 argument being the constant string "JSON" to distinguish it from
1501 other serializers.
1502
1503 The "FREEZE" method can return any number of values (i.e. zero or
1504 more). These values and the paclkage/classname of the object will
1505 then be encoded as a tagged JSON value in the following format:
1506
1507 ("classname")[FREEZE return values...]
1508
1509 e.g.:
1510
1511 ("URI")["http://www.google.com/"]
1512 ("MyDate")[2013,10,29]
1513 ("ImageData::JPEG")["Z3...VlCg=="]
1514
1515 For example, the hypothetical "My::Object" "FREEZE" method might
1516 use the objects "type" and "id" members to encode the object:
1517
1518 sub My::Object::FREEZE {
1519 my ($self, $serializer) = @_;
1520
1521 ($self->{type}, $self->{id})
1522 }
1523
1524 2. "convert_blessed" is enabled and the object has a "TO_JSON" method.
1525 In this case, the "TO_JSON" method of the object is invoked in
1526 scalar context. It must return a single scalar that can be directly
1527 encoded into JSON. This scalar replaces the object in the JSON
1528 text.
1529
1530 For example, the following "TO_JSON" method will convert all URI
1531 objects to JSON strings when serialized. The fact that these values
1532 originally were URI objects is lost.
1533
1534 sub URI::TO_JSON {
1535 my ($uri) = @_;
1536 $uri->as_string
1537 }
1538
1539 2. "convert_blessed" is enabled and the object has a stringification
1540 overload.
1541 In this case, the overloaded "" method of the object is invoked in
1542 scalar context. It must return a single scalar that can be directly
1543 encoded into JSON. This scalar replaces the object in the JSON
1544 text.
1545
1546 For example, the following "" method will convert all URI objects
1547 to JSON strings when serialized. The fact that these values
1548 originally were URI objects is lost.
1549
1550 package URI;
1551 use overload '""' => sub { shift->as_string };
1552
1553 3. "allow_blessed" is enabled.
1554 The object will be serialized as a JSON null value.
1555
1556 4. none of the above
1557 If none of the settings are enabled or the respective methods are
1558 missing, "Cpanel::JSON::XS" throws an exception.
1559
1560 DESERIALIZATION
1561
1562 For deserialization there are only two cases to consider: either
1563 nonstandard tagging was used, in which case "allow_tags" decides, or
1564 objects cannot be automatically be deserialized, in which case you can
1565 use postprocessing or the "filter_json_object" or
1566 "filter_json_single_key_object" callbacks to get some real objects our
1567 of your JSON.
1568
1569 This section only considers the tagged value case: I a tagged JSON
1570 object is encountered during decoding and "allow_tags" is disabled, a
1571 parse error will result (as if tagged values were not part of the
1572 grammar).
1573
1574 If "allow_tags" is enabled, "Cpanel::JSON::XS" will look up the "THAW"
1575 method of the package/classname used during serialization (it will not
1576 attempt to load the package as a Perl module). If there is no such
1577 method, the decoding will fail with an error.
1578
1579 Otherwise, the "THAW" method is invoked with the classname as first
1580 argument, the constant string "JSON" as second argument, and all the
1581 values from the JSON array (the values originally returned by the
1582 "FREEZE" method) as remaining arguments.
1583
1584 The method must then return the object. While technically you can
1585 return any Perl scalar, you might have to enable the "enable_nonref"
1586 setting to make that work in all cases, so better return an actual
1587 blessed reference.
1588
1589 As an example, let's implement a "THAW" function that regenerates the
1590 "My::Object" from the "FREEZE" example earlier:
1591
1592 sub My::Object::THAW {
1593 my ($class, $serializer, $type, $id) = @_;
1594
1595 $class->new (type => $type, id => $id)
1596 }
1597
1598 See the "SECURITY CONSIDERATIONS" section below. Allowing external json
1599 objects being deserialized to perl objects is usually a very bad idea.
1600
1602 The interested reader might have seen a number of flags that signify
1603 encodings or codesets - "utf8", "latin1", "binary" and "ascii". There
1604 seems to be some confusion on what these do, so here is a short
1605 comparison:
1606
1607 "utf8" controls whether the JSON text created by "encode" (and expected
1608 by "decode") is UTF-8 encoded or not, while "latin1" and "ascii" only
1609 control whether "encode" escapes character values outside their
1610 respective codeset range. Neither of these flags conflict with each
1611 other, although some combinations make less sense than others.
1612
1613 Care has been taken to make all flags symmetrical with respect to
1614 "encode" and "decode", that is, texts encoded with any combination of
1615 these flag values will be correctly decoded when the same flags are
1616 used - in general, if you use different flag settings while encoding
1617 vs. when decoding you likely have a bug somewhere.
1618
1619 Below comes a verbose discussion of these flags. Note that a "codeset"
1620 is simply an abstract set of character-codepoint pairs, while an
1621 encoding takes those codepoint numbers and encodes them, in our case
1622 into octets. Unicode is (among other things) a codeset, UTF-8 is an
1623 encoding, and ISO-8859-1 (= latin 1) and ASCII are both codesets and
1624 encodings at the same time, which can be confusing.
1625
1626 "utf8" flag disabled
1627 When "utf8" is disabled (the default), then "encode"/"decode"
1628 generate and expect Unicode strings, that is, characters with high
1629 ordinal Unicode values (> 255) will be encoded as such characters,
1630 and likewise such characters are decoded as-is, no changes to them
1631 will be done, except "(re-)interpreting" them as Unicode codepoints
1632 or Unicode characters, respectively (to Perl, these are the same
1633 thing in strings unless you do funny/weird/dumb stuff).
1634
1635 This is useful when you want to do the encoding yourself (e.g. when
1636 you want to have UTF-16 encoded JSON texts) or when some other
1637 layer does the encoding for you (for example, when printing to a
1638 terminal using a filehandle that transparently encodes to UTF-8 you
1639 certainly do NOT want to UTF-8 encode your data first and have Perl
1640 encode it another time).
1641
1642 "utf8" flag enabled
1643 If the "utf8"-flag is enabled, "encode"/"decode" will encode all
1644 characters using the corresponding UTF-8 multi-byte sequence, and
1645 will expect your input strings to be encoded as UTF-8, that is, no
1646 "character" of the input string must have any value > 255, as UTF-8
1647 does not allow that.
1648
1649 The "utf8" flag therefore switches between two modes: disabled
1650 means you will get a Unicode string in Perl, enabled means you get
1651 an UTF-8 encoded octet/binary string in Perl.
1652
1653 "latin1", "binary" or "ascii" flags enabled
1654 With "latin1" (or "ascii") enabled, "encode" will escape characters
1655 with ordinal values > 255 (> 127 with "ascii") and encode the
1656 remaining characters as specified by the "utf8" flag. With
1657 "binary" enabled, ordinal values > 255 are illegal.
1658
1659 If "utf8" is disabled, then the result is also correctly encoded in
1660 those character sets (as both are proper subsets of Unicode,
1661 meaning that a Unicode string with all character values < 256 is
1662 the same thing as a ISO-8859-1 string, and a Unicode string with
1663 all character values < 128 is the same thing as an ASCII string in
1664 Perl).
1665
1666 If "utf8" is enabled, you still get a correct UTF-8-encoded string,
1667 regardless of these flags, just some more characters will be
1668 escaped using "\uXXXX" then before.
1669
1670 Note that ISO-8859-1-encoded strings are not compatible with UTF-8
1671 encoding, while ASCII-encoded strings are. That is because the
1672 ISO-8859-1 encoding is NOT a subset of UTF-8 (despite the
1673 ISO-8859-1 codeset being a subset of Unicode), while ASCII is.
1674
1675 Surprisingly, "decode" will ignore these flags and so treat all
1676 input values as governed by the "utf8" flag. If it is disabled,
1677 this allows you to decode ISO-8859-1- and ASCII-encoded strings, as
1678 both strict subsets of Unicode. If it is enabled, you can correctly
1679 decode UTF-8 encoded strings.
1680
1681 So neither "latin1", "binary" nor "ascii" are incompatible with the
1682 "utf8" flag - they only govern when the JSON output engine escapes
1683 a character or not.
1684
1685 The main use for "latin1" or "binary" is to relatively efficiently
1686 store binary data as JSON, at the expense of breaking compatibility
1687 with most JSON decoders.
1688
1689 The main use for "ascii" is to force the output to not contain
1690 characters with values > 127, which means you can interpret the
1691 resulting string as UTF-8, ISO-8859-1, ASCII, KOI8-R or most about
1692 any character set and 8-bit-encoding, and still get the same data
1693 structure back. This is useful when your channel for JSON transfer
1694 is not 8-bit clean or the encoding might be mangled in between
1695 (e.g. in mail), and works because ASCII is a proper subset of most
1696 8-bit and multibyte encodings in use in the world.
1697
1698 JSON and ECMAscript
1699 JSON syntax is based on how literals are represented in javascript (the
1700 not-standardized predecessor of ECMAscript) which is presumably why it
1701 is called "JavaScript Object Notation".
1702
1703 However, JSON is not a subset (and also not a superset of course) of
1704 ECMAscript (the standard) or javascript (whatever browsers actually
1705 implement).
1706
1707 If you want to use javascript's "eval" function to "parse" JSON, you
1708 might run into parse errors for valid JSON texts, or the resulting data
1709 structure might not be queryable:
1710
1711 One of the problems is that U+2028 and U+2029 are valid characters
1712 inside JSON strings, but are not allowed in ECMAscript string literals,
1713 so the following Perl fragment will not output something that can be
1714 guaranteed to be parsable by javascript's "eval":
1715
1716 use Cpanel::JSON::XS;
1717
1718 print encode_json [chr 0x2028];
1719
1720 The right fix for this is to use a proper JSON parser in your
1721 javascript programs, and not rely on "eval" (see for example Douglas
1722 Crockford's json2.js parser).
1723
1724 If this is not an option, you can, as a stop-gap measure, simply encode
1725 to ASCII-only JSON:
1726
1727 use Cpanel::JSON::XS;
1728
1729 print Cpanel::JSON::XS->new->ascii->encode ([chr 0x2028]);
1730
1731 Note that this will enlarge the resulting JSON text quite a bit if you
1732 have many non-ASCII characters. You might be tempted to run some
1733 regexes to only escape U+2028 and U+2029, e.g.:
1734
1735 # DO NOT USE THIS!
1736 my $json = Cpanel::JSON::XS->new->utf8->encode ([chr 0x2028]);
1737 $json =~ s/\xe2\x80\xa8/\\u2028/g; # escape U+2028
1738 $json =~ s/\xe2\x80\xa9/\\u2029/g; # escape U+2029
1739 print $json;
1740
1741 Note that this is a bad idea: the above only works for U+2028 and
1742 U+2029 and thus only for fully ECMAscript-compliant parsers. Many
1743 existing javascript implementations, however, have issues with other
1744 characters as well - using "eval" naively simply will cause problems.
1745
1746 Another problem is that some javascript implementations reserve some
1747 property names for their own purposes (which probably makes them non-
1748 ECMAscript-compliant). For example, Iceweasel reserves the "__proto__"
1749 property name for its own purposes.
1750
1751 If that is a problem, you could parse try to filter the resulting JSON
1752 output for these property strings, e.g.:
1753
1754 $json =~ s/"__proto__"\s*:/"__proto__renamed":/g;
1755
1756 This works because "__proto__" is not valid outside of strings, so
1757 every occurrence of ""__proto__"\s*:" must be a string used as property
1758 name.
1759
1760 Unicode non-characters between U+FFFD and U+10FFFF are decoded either
1761 to the recommended U+FFFD REPLACEMENT CHARACTER (see Unicode PR #121:
1762 Recommended Practice for Replacement Characters), or in the binary or
1763 relaxed mode left as is, keeping the illegal non-characters as before.
1764
1765 Raw non-Unicode characters outside the valid unicode range fail now to
1766 parse, because "A string is a sequence of zero or more Unicode
1767 characters" RFC 7159 section 1 and "JSON text SHALL be encoded in
1768 Unicode RFC 7159 section 8.1. We use now the UTF8_DISALLOW_SUPER flag
1769 when parsing unicode.
1770
1771 If you know of other incompatibilities, please let me know.
1772
1773 JSON and YAML
1774 You often hear that JSON is a subset of YAML. in general, there is no
1775 way to configure JSON::XS to output a data structure as valid YAML that
1776 works in all cases. If you really must use Cpanel::JSON::XS to
1777 generate YAML, you should use this algorithm (subject to change in
1778 future versions):
1779
1780 my $to_yaml = Cpanel::JSON::XS->new->utf8->space_after (1);
1781 my $yaml = $to_yaml->encode ($ref) . "\n";
1782
1783 This will usually generate JSON texts that also parse as valid YAML.
1784
1785 SPEED
1786 It seems that JSON::XS is surprisingly fast, as shown in the following
1787 tables. They have been generated with the help of the "eg/bench"
1788 program in the JSON::XS distribution, to make it easy to compare on
1789 your own system.
1790
1791 JSON::XS is with Data::MessagePack and Sereal one of the fastest
1792 serializers, because JSON and JSON::XS do not support backrefs (no
1793 graph structures), only trees. Storable supports backrefs, i.e. graphs.
1794 Data::MessagePack encodes its data binary (as Storable) and supports
1795 only very simple subset of JSON.
1796
1797 First comes a comparison between various modules using a very short
1798 single-line JSON string (also available at
1799 <http://dist.schmorp.de/misc/json/short.json>).
1800
1801 {"method": "handleMessage", "params": ["user1",
1802 "we were just talking"], "id": null, "array":[1,11,234,-5,1e5,1e7,
1803 1, 0]}
1804
1805 It shows the number of encodes/decodes per second (JSON::XS uses the
1806 functional interface, while Cpanel::JSON::XS/2 uses the OO interface
1807 with pretty-printing and hash key sorting enabled, Cpanel::JSON::XS/3
1808 enables shrink. JSON::DWIW/DS uses the deserialize function, while
1809 JSON::DWIW::FJ uses the from_json method). Higher is better:
1810
1811 module | encode | decode |
1812 --------------|------------|------------|
1813 JSON::DWIW/DS | 86302.551 | 102300.098 |
1814 JSON::DWIW/FJ | 86302.551 | 75983.768 |
1815 JSON::PP | 15827.562 | 6638.658 |
1816 JSON::Syck | 63358.066 | 47662.545 |
1817 JSON::XS | 511500.488 | 511500.488 |
1818 JSON::XS/2 | 291271.111 | 388361.481 |
1819 JSON::XS/3 | 361577.931 | 361577.931 |
1820 Storable | 66788.280 | 265462.278 |
1821 --------------+------------+------------+
1822
1823 That is, JSON::XS is almost six times faster than JSON::DWIW on
1824 encoding, about five times faster on decoding, and over thirty to
1825 seventy times faster than JSON's pure perl implementation. It also
1826 compares favourably to Storable for small amounts of data.
1827
1828 Using a longer test string (roughly 18KB, generated from Yahoo! Locals
1829 search API (<http://dist.schmorp.de/misc/json/long.json>).
1830
1831 module | encode | decode |
1832 --------------|------------|------------|
1833 JSON::DWIW/DS | 1647.927 | 2673.916 |
1834 JSON::DWIW/FJ | 1630.249 | 2596.128 |
1835 JSON::PP | 400.640 | 62.311 |
1836 JSON::Syck | 1481.040 | 1524.869 |
1837 JSON::XS | 20661.596 | 9541.183 |
1838 JSON::XS/2 | 10683.403 | 9416.938 |
1839 JSON::XS/3 | 20661.596 | 9400.054 |
1840 Storable | 19765.806 | 10000.725 |
1841 --------------+------------+------------+
1842
1843 Again, JSON::XS leads by far (except for Storable which non-
1844 surprisingly decodes a bit faster).
1845
1846 On large strings containing lots of high Unicode characters, some
1847 modules (such as JSON::PC) seem to decode faster than JSON::XS, but the
1848 result will be broken due to missing (or wrong) Unicode handling.
1849 Others refuse to decode or encode properly, so it was impossible to
1850 prepare a fair comparison table for that case.
1851
1852 For updated graphs see
1853 <https://github.com/Sereal/Sereal/wiki/Sereal-Comparison-Graphs>
1854
1856 As long as you only serialize data that can be directly expressed in
1857 JSON, "Cpanel::JSON::XS" is incapable of generating invalid JSON output
1858 (modulo bugs, but "JSON::XS" has found more bugs in the official JSON
1859 testsuite (1) than the official JSON testsuite has found in "JSON::XS"
1860 (0)). "Cpanel::JSON::XS" is currently the only known JSON decoder
1861 which passes all <http://seriot.ch/parsing_json.html> tests, while
1862 being the fastest also.
1863
1864 When you have trouble decoding JSON generated by this module using
1865 other decoders, then it is very likely that you have an encoding
1866 mismatch or the other decoder is broken.
1867
1868 When decoding, "JSON::XS" is strict by default and will likely catch
1869 all errors. There are currently two settings that change this:
1870 "relaxed" makes "JSON::XS" accept (but not generate) some non-standard
1871 extensions, and "allow_tags" or "allow_blessed" will allow you to
1872 encode and decode Perl objects, at the cost of being totally insecure
1873 and not outputting valid JSON anymore.
1874
1875 JSON-XS-3.01 broke interoperability with JSON-2.90 with booleans. See
1876 JSON.
1877
1878 Cpanel::JSON::XS needs to know the JSON and JSON::XS versions to be
1879 able work with those objects, especially when encoding a booleans like
1880 "{"is_true":true}". So you need to load these modules before.
1881
1882 true/false overloading and boolean representations are supported.
1883
1884 JSON::XS and JSON::PP representations are accepted and older JSON::XS
1885 accepts Cpanel::JSON::XS booleans. All JSON modules JSON, JSON, PP,
1886 JSON::XS, Cpanel::JSON::XS produce JSON::PP::Boolean objects, just Mojo
1887 and JSON::YAJL not. Mojo produces Mojo::JSON::_Bool and
1888 JSON::YAJL::Parser just an unblessed IV.
1889
1890 Cpanel::JSON::XS accepts JSON::PP::Boolean and Mojo::JSON::_Bool
1891 objects as booleans.
1892
1893 I cannot think of any reason to still use JSON::XS anymore.
1894
1895 TAGGED VALUE SYNTAX AND STANDARD JSON EN/DECODERS
1896 When you use "allow_tags" to use the extended (and also nonstandard and
1897 invalid) JSON syntax for serialized objects, and you still want to
1898 decode the generated serialize objects, you can run a regex to replace
1899 the tagged syntax by standard JSON arrays (it only works for "normal"
1900 package names without comma, newlines or single colons). First, the
1901 readable Perl version:
1902
1903 # if your FREEZE methods return no values, you need this replace first:
1904 $json =~ s/\( \s* (" (?: [^\\":,]+|\\.|::)* ") \s* \) \s* \[\s*\]/[$1]/gx;
1905
1906 # this works for non-empty constructor arg lists:
1907 $json =~ s/\( \s* (" (?: [^\\":,]+|\\.|::)* ") \s* \) \s* \[/[$1,/gx;
1908
1909 And here is a less readable version that is easy to adapt to other
1910 languages:
1911
1912 $json =~ s/\(\s*("([^\\":,]+|\\.|::)*")\s*\)\s*\[/[$1,/g;
1913
1914 Here is an ECMAScript version (same regex):
1915
1916 json = json.replace (/\(\s*("([^\\":,]+|\\.|::)*")\s*\)\s*\[/g, "[$1,");
1917
1918 Since this syntax converts to standard JSON arrays, it might be hard to
1919 distinguish serialized objects from normal arrays. You can prepend a
1920 "magic number" as first array element to reduce chances of a collision:
1921
1922 $json =~ s/\(\s*("([^\\":,]+|\\.|::)*")\s*\)\s*\[/["XU1peReLzT4ggEllLanBYq4G9VzliwKF",$1,/g;
1923
1924 And after decoding the JSON text, you could walk the data structure
1925 looking for arrays with a first element of
1926 "XU1peReLzT4ggEllLanBYq4G9VzliwKF".
1927
1928 The same approach can be used to create the tagged format with another
1929 encoder. First, you create an array with the magic string as first
1930 member, the classname as second, and constructor arguments last, encode
1931 it as part of your JSON structure, and then:
1932
1933 $json =~ s/\[\s*"XU1peReLzT4ggEllLanBYq4G9VzliwKF"\s*,\s*("([^\\":,]+|\\.|::)*")\s*,/($1)[/g;
1934
1935 Again, this has some limitations - the magic string must not be encoded
1936 with character escapes, and the constructor arguments must be non-
1937 empty.
1938
1940 Since this module was written, Google has written a new JSON RFC, RFC
1941 7159 (and RFC7158). Unfortunately, this RFC breaks compatibility with
1942 both the original JSON specification on www.json.org and RFC4627.
1943
1944 As far as I can see, you can get partial compatibility when parsing by
1945 using "->allow_nonref". However, consider the security implications of
1946 doing so.
1947
1948 I haven't decided yet when to break compatibility with RFC4627 by
1949 default (and potentially leave applications insecure) and change the
1950 default to follow RFC7159, but application authors are well advised to
1951 call "->allow_nonref(0)" even if this is the current default, if they
1952 cannot handle non-reference values, in preparation for the day when the
1953 default will change.
1954
1956 JSON::XS and Cpanel::JSON::XS are not only fast. JSON is generally the
1957 most secure serializing format, because it is the only one besides
1958 Data::MessagePack, which does not deserialize objects per default. For
1959 all languages, not just perl. The binary variant BSON (MongoDB) does
1960 more but is unsafe.
1961
1962 It is trivial for any attacker to create such serialized objects in
1963 JSON and trick perl into expanding them, thereby triggering certain
1964 methods. Watch <https://www.youtube.com/watch?v=Gzx6KlqiIZE> for an
1965 exploit demo for "CVE-2015-1592 SixApart MovableType Storable Perl Code
1966 Execution" for a deserializer which expands objects. Deserializing
1967 even coderefs (methods, functions) or external data would be considered
1968 the most dangerous.
1969
1970 Security relevant overview of serializers regarding deserializing
1971 objects by default:
1972
1973 Objects Coderefs External Data
1974
1975 Data::Dumper YES YES YES
1976 Storable YES NO (def) NO
1977 Sereal YES NO NO
1978 YAML YES NO NO
1979 B::C YES YES YES
1980 B::Bytecode YES YES YES
1981 BSON YES YES NO
1982 JSON::SL YES NO YES
1983 JSON NO (def) NO NO
1984 Data::MessagePack NO NO NO
1985 XML NO NO YES
1986
1987 Pickle YES YES YES
1988 PHP Deserialize YES NO NO
1989
1990 When you are using JSON in a protocol, talking to untrusted potentially
1991 hostile creatures requires relatively few measures.
1992
1993 First of all, your JSON decoder should be secure, that is, should not
1994 have any buffer overflows. Obviously, this module should ensure that.
1995
1996 Second, you need to avoid resource-starving attacks. That means you
1997 should limit the size of JSON texts you accept, or make sure then when
1998 your resources run out, that's just fine (e.g. by using a separate
1999 process that can crash safely). The size of a JSON text in octets or
2000 characters is usually a good indication of the size of the resources
2001 required to decode it into a Perl structure. While JSON::XS can check
2002 the size of the JSON text, it might be too late when you already have
2003 it in memory, so you might want to check the size before you accept the
2004 string.
2005
2006 Third, Cpanel::JSON::XS recurses using the C stack when decoding
2007 objects and arrays. The C stack is a limited resource: for instance, on
2008 my amd64 machine with 8MB of stack size I can decode around 180k nested
2009 arrays but only 14k nested JSON objects (due to perl itself recursing
2010 deeply on croak to free the temporary). If that is exceeded, the
2011 program crashes. To be conservative, the default nesting limit is set
2012 to 512. If your process has a smaller stack, you should adjust this
2013 setting accordingly with the "max_depth" method.
2014
2015 Also keep in mind that Cpanel::JSON::XS might leak contents of your
2016 Perl data structures in its error messages, so when you serialize
2017 sensitive information you might want to make sure that exceptions
2018 thrown by JSON::XS will not end up in front of untrusted eyes.
2019
2020 If you are using Cpanel::JSON::XS to return packets to consumption by
2021 JavaScript scripts in a browser you should have a look at
2022 <http://blog.archive.jpsykes.com/47/practical-csrf-and-json-security/>
2023 to see whether you are vulnerable to some common attack vectors (which
2024 really are browser design bugs, but it is still you who will have to
2025 deal with it, as major browser developers care only for features, not
2026 about getting security right). You might also want to also look at
2027 Mojo::JSON special escape rules to prevent from XSS attacks.
2028
2030 TL;DR: Due to security concerns, Cpanel::JSON::XS will not allow scalar
2031 data in JSON texts by default - you need to create your own
2032 Cpanel::JSON::XS object and enable "allow_nonref":
2033
2034 my $json = JSON::XS->new->allow_nonref;
2035
2036 $text = $json->encode ($data);
2037 $data = $json->decode ($text);
2038
2039 The long version: JSON being an important and supposedly stable format,
2040 the IETF standardized it as RFC 4627 in 2006. Unfortunately the
2041 inventor of JSON Douglas Crockford unilaterally changed the definition
2042 of JSON in javascript. Rather than create a fork, the IETF decided to
2043 standardize the new syntax (apparently, so I as told, without finding
2044 it very amusing).
2045
2046 The biggest difference between the original JSON and the new JSON is
2047 that the new JSON supports scalars (anything other than arrays and
2048 objects) at the top-level of a JSON text. While this is strictly
2049 backwards compatible to older versions, it breaks a number of protocols
2050 that relied on sending JSON back-to-back, and is a minor security
2051 concern.
2052
2053 For example, imagine you have two banks communicating, and on one side,
2054 the JSON coder gets upgraded. Two messages, such as 10 and 1000 might
2055 then be confused to mean 101000, something that couldn't happen in the
2056 original JSON, because neither of these messages would be valid JSON.
2057
2058 If one side accepts these messages, then an upgrade in the coder on
2059 either side could result in this becoming exploitable.
2060
2061 This module has always allowed these messages as an optional extension,
2062 by default disabled. The security concerns are the reason why the
2063 default is still disabled, but future versions might/will likely
2064 upgrade to the newer RFC as default format, so you are advised to check
2065 your implementation and/or override the default with "->allow_nonref
2066 (0)" to ensure that future versions are safe.
2067
2069 Cpanel::JSON::XS has proper ithreads support, unlike JSON::XS. If you
2070 encounter any bugs with thread support please report them.
2071
2073 While the goal of the Cpanel::JSON::XS module is to be correct, that
2074 unfortunately does not mean it's bug-free, only that the author thinks
2075 its design is bug-free. If you keep reporting bugs and tests they will
2076 be fixed swiftly, though.
2077
2078 Since the JSON::XS author refuses to use a public bugtracker and
2079 prefers private emails, we use the tracker at github, so you might want
2080 to report any issues twice. Once in private to MLEHMANN to be fixed in
2081 JSON::XS and one to our the public tracker. Issues fixed by JSON::XS
2082 with a new release will also be backported to Cpanel::JSON::XS and
2083 5.6.2, as long as cPanel relies on 5.6.2 and Cpanel::JSON::XS as our
2084 serializer of choice.
2085
2086 <https://github.com/rurban/Cpanel-JSON-XS/issues>
2087
2089 This module is available under the same licences as perl, the Artistic
2090 license and the GPL.
2091
2093 The cpanel_json_xs command line utility for quick experiments.
2094
2095 JSON, JSON::XS, JSON::MaybeXS, Mojo::JSON, Mojo::JSON::MaybeXS,
2096 JSON::SL, JSON::DWIW, JSON::YAJL, JSON::Any, Test::JSON,
2097 Locale::Wolowitz, <https://metacpan.org/search?q=JSON>
2098
2099 <https://tools.ietf.org/html/rfc7159>
2100
2101 <https://tools.ietf.org/html/rfc4627>
2102
2104 Reini Urban <rurban@cpan.org>
2105
2106 Marc Lehmann <schmorp@schmorp.de>, http://home.schmorp.de/
2107
2109 Reini Urban <rurban@cpan.org>
2110
2111
2112
2113perl v5.30.0 2019-07-26 XS(3)