1XS(3) User Contributed Perl Documentation XS(3)
2
3
4
6 JSON::XS - JSON serialising/deserialising, done correctly and fast
7
8 JSON::XS - 正しくて高速な JSON シリアライザ/デシリアライザ
9 (http://fleur.hio.jp/perldoc/mix/lib/JSON/XS.html)
10
12 use JSON::XS;
13
14 # exported functions, they croak on error
15 # and expect/generate UTF-8
16
17 $utf8_encoded_json_text = encode_json $perl_hash_or_arrayref;
18 $perl_hash_or_arrayref = decode_json $utf8_encoded_json_text;
19
20 # OO-interface
21
22 $coder = JSON::XS->new->ascii->pretty->allow_nonref;
23 $pretty_printed_unencoded = $coder->encode ($perl_scalar);
24 $perl_scalar = $coder->decode ($unicode_json_text);
25
26 # Note that JSON version 2.0 and above will automatically use JSON::XS
27 # if available, at virtually no speed overhead either, so you should
28 # be able to just:
29
30 use JSON;
31
32 # and do the same things, except that you have a pure-perl fallback now.
33
35 This module converts Perl data structures to JSON and vice versa. Its
36 primary goal is to be correct and its secondary goal is to be fast. To
37 reach the latter goal it was written in C.
38
39 Beginning with version 2.0 of the JSON module, when both JSON and
40 JSON::XS are installed, then JSON will fall back on JSON::XS (this can
41 be overridden) with no overhead due to emulation (by inheriting
42 constructor and methods). If JSON::XS is not available, it will fall
43 back to the compatible JSON::PP module as backend, so using JSON
44 instead of JSON::XS gives you a portable JSON API that can be fast when
45 you need it and doesn't require a C compiler when that is a problem.
46
47 As this is the n-th-something JSON module on CPAN, what was the reason
48 to write yet another JSON module? While it seems there are many JSON
49 modules, none of them correctly handle all corner cases, and in most
50 cases their maintainers are unresponsive, gone missing, or not
51 listening to bug reports for other reasons.
52
53 See MAPPING, below, on how JSON::XS maps perl values to JSON values and
54 vice versa.
55
56 FEATURES
57 · correct Unicode handling
58
59 This module knows how to handle Unicode, documents how and when it
60 does so, and even documents what "correct" means.
61
62 · round-trip integrity
63
64 When you serialise a perl data structure using only data types
65 supported by JSON and Perl, the deserialised data structure is
66 identical on the Perl level. (e.g. the string "2.0" doesn't
67 suddenly become "2" just because it looks like a number). There are
68 minor exceptions to this, read the MAPPING section below to learn
69 about those.
70
71 · strict checking of JSON correctness
72
73 There is no guessing, no generating of illegal JSON texts by
74 default, and only JSON is accepted as input by default (the latter
75 is a security feature).
76
77 · fast
78
79 Compared to other JSON modules and other serialisers such as
80 Storable, this module usually compares favourably in terms of
81 speed, too.
82
83 · simple to use
84
85 This module has both a simple functional interface as well as an
86 object oriented interface.
87
88 · reasonably versatile output formats
89
90 You can choose between the most compact guaranteed-single-line
91 format possible (nice for simple line-based protocols), a pure-
92 ASCII format (for when your transport is not 8-bit clean, still
93 supports the whole Unicode range), or a pretty-printed format (for
94 when you want to read that stuff). Or you can combine those
95 features in whatever way you like.
96
98 The following convenience methods are provided by this module. They are
99 exported by default:
100
101 $json_text = encode_json $perl_scalar
102 Converts the given Perl data structure to a UTF-8 encoded, binary
103 string (that is, the string contains octets only). Croaks on error.
104
105 This function call is functionally identical to:
106
107 $json_text = JSON::XS->new->utf8->encode ($perl_scalar)
108
109 Except being faster.
110
111 $perl_scalar = decode_json $json_text
112 The opposite of "encode_json": expects an UTF-8 (binary) string and
113 tries to parse that as an UTF-8 encoded JSON text, returning the
114 resulting reference. Croaks on error.
115
116 This function call is functionally identical to:
117
118 $perl_scalar = JSON::XS->new->utf8->decode ($json_text)
119
120 Except being faster.
121
123 Since this often leads to confusion, here are a few very clear words on
124 how Unicode works in Perl, modulo bugs.
125
126 1. Perl strings can store characters with ordinal values > 255.
127 This enables you to store Unicode characters as single characters
128 in a Perl string - very natural.
129
130 2. Perl does not associate an encoding with your strings.
131 ... until you force it to, e.g. when matching it against a regex,
132 or printing the scalar to a file, in which case Perl either
133 interprets your string as locale-encoded text, octets/binary, or as
134 Unicode, depending on various settings. In no case is an encoding
135 stored together with your data, it is use that decides encoding,
136 not any magical meta data.
137
138 3. The internal utf-8 flag has no meaning with regards to the encoding
139 of your string.
140 Just ignore that flag unless you debug a Perl bug, a module written
141 in XS or want to dive into the internals of perl. Otherwise it will
142 only confuse you, as, despite the name, it says nothing about how
143 your string is encoded. You can have Unicode strings with that flag
144 set, with that flag clear, and you can have binary data with that
145 flag set and that flag clear. Other possibilities exist, too.
146
147 If you didn't know about that flag, just the better, pretend it
148 doesn't exist.
149
150 4. A "Unicode String" is simply a string where each character can be
151 validly interpreted as a Unicode code point.
152 If you have UTF-8 encoded data, it is no longer a Unicode string,
153 but a Unicode string encoded in UTF-8, giving you a binary string.
154
155 5. A string containing "high" (> 255) character values is not a UTF-8
156 string.
157 It's a fact. Learn to live with it.
158
159 I hope this helps :)
160
162 The object oriented interface lets you configure your own encoding or
163 decoding style, within the limits of supported formats.
164
165 $json = new JSON::XS
166 Creates a new JSON::XS object that can be used to de/encode JSON
167 strings. All boolean flags described below are by default disabled.
168
169 The mutators for flags all return the JSON object again and thus
170 calls can be chained:
171
172 my $json = JSON::XS->new->utf8->space_after->encode ({a => [1,2]})
173 => {"a": [1, 2]}
174
175 $json = $json->ascii ([$enable])
176 $enabled = $json->get_ascii
177 If $enable is true (or missing), then the "encode" method will not
178 generate characters outside the code range 0..127 (which is ASCII).
179 Any Unicode characters outside that range will be escaped using
180 either a single \uXXXX (BMP characters) or a double \uHHHH\uLLLLL
181 escape sequence, as per RFC4627. The resulting encoded JSON text
182 can be treated as a native Unicode string, an ascii-encoded,
183 latin1-encoded or UTF-8 encoded string, or any other superset of
184 ASCII.
185
186 If $enable is false, then the "encode" method will not escape
187 Unicode characters unless required by the JSON syntax or other
188 flags. This results in a faster and more compact format.
189
190 See also the section ENCODING/CODESET FLAG NOTES later in this
191 document.
192
193 The main use for this flag is to produce JSON texts that can be
194 transmitted over a 7-bit channel, as the encoded JSON texts will
195 not contain any 8 bit characters.
196
197 JSON::XS->new->ascii (1)->encode ([chr 0x10401])
198 => ["\ud801\udc01"]
199
200 $json = $json->latin1 ([$enable])
201 $enabled = $json->get_latin1
202 If $enable is true (or missing), then the "encode" method will
203 encode the resulting JSON text as latin1 (or iso-8859-1), escaping
204 any characters outside the code range 0..255. The resulting string
205 can be treated as a latin1-encoded JSON text or a native Unicode
206 string. The "decode" method will not be affected in any way by this
207 flag, as "decode" by default expects Unicode, which is a strict
208 superset of latin1.
209
210 If $enable is false, then the "encode" method will not escape
211 Unicode characters unless required by the JSON syntax or other
212 flags.
213
214 See also the section ENCODING/CODESET FLAG NOTES later in this
215 document.
216
217 The main use for this flag is efficiently encoding binary data as
218 JSON text, as most octets will not be escaped, resulting in a
219 smaller encoded size. The disadvantage is that the resulting JSON
220 text is encoded in latin1 (and must correctly be treated as such
221 when storing and transferring), a rare encoding for JSON. It is
222 therefore most useful when you want to store data structures known
223 to contain binary data efficiently in files or databases, not when
224 talking to other JSON encoders/decoders.
225
226 JSON::XS->new->latin1->encode (["\x{89}\x{abc}"]
227 => ["\x{89}\\u0abc"] # (perl syntax, U+abc escaped, U+89 not)
228
229 $json = $json->utf8 ([$enable])
230 $enabled = $json->get_utf8
231 If $enable is true (or missing), then the "encode" method will
232 encode the JSON result into UTF-8, as required by many protocols,
233 while the "decode" method expects to be handled an UTF-8-encoded
234 string. Please note that UTF-8-encoded strings do not contain any
235 characters outside the range 0..255, they are thus useful for
236 bytewise/binary I/O. In future versions, enabling this option might
237 enable autodetection of the UTF-16 and UTF-32 encoding families, as
238 described in RFC4627.
239
240 If $enable is false, then the "encode" method will return the JSON
241 string as a (non-encoded) Unicode string, while "decode" expects
242 thus a Unicode string. Any decoding or encoding (e.g. to UTF-8 or
243 UTF-16) needs to be done yourself, e.g. using the Encode module.
244
245 See also the section ENCODING/CODESET FLAG NOTES later in this
246 document.
247
248 Example, output UTF-16BE-encoded JSON:
249
250 use Encode;
251 $jsontext = encode "UTF-16BE", JSON::XS->new->encode ($object);
252
253 Example, decode UTF-32LE-encoded JSON:
254
255 use Encode;
256 $object = JSON::XS->new->decode (decode "UTF-32LE", $jsontext);
257
258 $json = $json->pretty ([$enable])
259 This enables (or disables) all of the "indent", "space_before" and
260 "space_after" (and in the future possibly more) flags in one call
261 to generate the most readable (or most compact) form possible.
262
263 Example, pretty-print some simple structure:
264
265 my $json = JSON::XS->new->pretty(1)->encode ({a => [1,2]})
266 =>
267 {
268 "a" : [
269 1,
270 2
271 ]
272 }
273
274 $json = $json->indent ([$enable])
275 $enabled = $json->get_indent
276 If $enable is true (or missing), then the "encode" method will use
277 a multiline format as output, putting every array member or
278 object/hash key-value pair into its own line, indenting them
279 properly.
280
281 If $enable is false, no newlines or indenting will be produced, and
282 the resulting JSON text is guaranteed not to contain any
283 "newlines".
284
285 This setting has no effect when decoding JSON texts.
286
287 $json = $json->space_before ([$enable])
288 $enabled = $json->get_space_before
289 If $enable is true (or missing), then the "encode" method will add
290 an extra optional space before the ":" separating keys from values
291 in JSON objects.
292
293 If $enable is false, then the "encode" method will not add any
294 extra space at those places.
295
296 This setting has no effect when decoding JSON texts. You will also
297 most likely combine this setting with "space_after".
298
299 Example, space_before enabled, space_after and indent disabled:
300
301 {"key" :"value"}
302
303 $json = $json->space_after ([$enable])
304 $enabled = $json->get_space_after
305 If $enable is true (or missing), then the "encode" method will add
306 an extra optional space after the ":" separating keys from values
307 in JSON objects and extra whitespace after the "," separating key-
308 value pairs and array members.
309
310 If $enable is false, then the "encode" method will not add any
311 extra space at those places.
312
313 This setting has no effect when decoding JSON texts.
314
315 Example, space_before and indent disabled, space_after enabled:
316
317 {"key": "value"}
318
319 $json = $json->relaxed ([$enable])
320 $enabled = $json->get_relaxed
321 If $enable is true (or missing), then "decode" will accept some
322 extensions to normal JSON syntax (see below). "encode" will not be
323 affected in anyway. Be aware that this option makes you accept
324 invalid JSON texts as if they were valid!. I suggest only to use
325 this option to parse application-specific files written by humans
326 (configuration files, resource files etc.)
327
328 If $enable is false (the default), then "decode" will only accept
329 valid JSON texts.
330
331 Currently accepted extensions are:
332
333 · list items can have an end-comma
334
335 JSON separates array elements and key-value pairs with commas.
336 This can be annoying if you write JSON texts manually and want
337 to be able to quickly append elements, so this extension
338 accepts comma at the end of such items not just between them:
339
340 [
341 1,
342 2, <- this comma not normally allowed
343 ]
344 {
345 "k1": "v1",
346 "k2": "v2", <- this comma not normally allowed
347 }
348
349 · shell-style '#'-comments
350
351 Whenever JSON allows whitespace, shell-style comments are
352 additionally allowed. They are terminated by the first
353 carriage-return or line-feed character, after which more white-
354 space and comments are allowed.
355
356 [
357 1, # this comment not allowed in JSON
358 # neither this one...
359 ]
360
361 · literal ASCII TAB characters in strings
362
363 Literal ASCII TAB characters are now allowed in strings (and
364 treated as "\t").
365
366 [
367 "Hello\tWorld",
368 "Hello<TAB>World", # literal <TAB> would not normally be allowed
369 ]
370
371 $json = $json->canonical ([$enable])
372 $enabled = $json->get_canonical
373 If $enable is true (or missing), then the "encode" method will
374 output JSON objects by sorting their keys. This is adding a
375 comparatively high overhead.
376
377 If $enable is false, then the "encode" method will output key-value
378 pairs in the order Perl stores them (which will likely change
379 between runs of the same script, and can change even within the
380 same run from 5.18 onwards).
381
382 This option is useful if you want the same data structure to be
383 encoded as the same JSON text (given the same overall settings). If
384 it is disabled, the same hash might be encoded differently even if
385 contains the same data, as key-value pairs have no inherent
386 ordering in Perl.
387
388 This setting has no effect when decoding JSON texts.
389
390 This setting has currently no effect on tied hashes.
391
392 $json = $json->allow_nonref ([$enable])
393 $enabled = $json->get_allow_nonref
394 If $enable is true (or missing), then the "encode" method can
395 convert a non-reference into its corresponding string, number or
396 null JSON value, which is an extension to RFC4627. Likewise,
397 "decode" will accept those JSON values instead of croaking.
398
399 If $enable is false, then the "encode" method will croak if it
400 isn't passed an arrayref or hashref, as JSON texts must either be
401 an object or array. Likewise, "decode" will croak if given
402 something that is not a JSON object or array.
403
404 Example, encode a Perl scalar as JSON value with enabled
405 "allow_nonref", resulting in an invalid JSON text:
406
407 JSON::XS->new->allow_nonref->encode ("Hello, World!")
408 => "Hello, World!"
409
410 $json = $json->allow_unknown ([$enable])
411 $enabled = $json->get_allow_unknown
412 If $enable is true (or missing), then "encode" will not throw an
413 exception when it encounters values it cannot represent in JSON
414 (for example, filehandles) but instead will encode a JSON "null"
415 value. Note that blessed objects are not included here and are
416 handled separately by c<allow_nonref>.
417
418 If $enable is false (the default), then "encode" will throw an
419 exception when it encounters anything it cannot encode as JSON.
420
421 This option does not affect "decode" in any way, and it is
422 recommended to leave it off unless you know your communications
423 partner.
424
425 $json = $json->allow_blessed ([$enable])
426 $enabled = $json->get_allow_blessed
427 See "OBJECT SERIALISATION" for details.
428
429 If $enable is true (or missing), then the "encode" method will not
430 barf when it encounters a blessed reference that it cannot convert
431 otherwise. Instead, a JSON "null" value is encoded instead of the
432 object.
433
434 If $enable is false (the default), then "encode" will throw an
435 exception when it encounters a blessed object that it cannot
436 convert otherwise.
437
438 This setting has no effect on "decode".
439
440 $json = $json->convert_blessed ([$enable])
441 $enabled = $json->get_convert_blessed
442 See "OBJECT SERIALISATION" for details.
443
444 If $enable is true (or missing), then "encode", upon encountering a
445 blessed object, will check for the availability of the "TO_JSON"
446 method on the object's class. If found, it will be called in scalar
447 context and the resulting scalar will be encoded instead of the
448 object.
449
450 The "TO_JSON" method may safely call die if it wants. If "TO_JSON"
451 returns other blessed objects, those will be handled in the same
452 way. "TO_JSON" must take care of not causing an endless recursion
453 cycle (== crash) in this case. The name of "TO_JSON" was chosen
454 because other methods called by the Perl core (== not by the user
455 of the object) are usually in upper case letters and to avoid
456 collisions with any "to_json" function or method.
457
458 If $enable is false (the default), then "encode" will not consider
459 this type of conversion.
460
461 This setting has no effect on "decode".
462
463 $json = $json->allow_tags ([$enable])
464 $enabled = $json->allow_tags
465 See "OBJECT SERIALISATION" for details.
466
467 If $enable is true (or missing), then "encode", upon encountering a
468 blessed object, will check for the availability of the "FREEZE"
469 method on the object's class. If found, it will be used to
470 serialise the object into a nonstandard tagged JSON value (that
471 JSON decoders cannot decode).
472
473 It also causes "decode" to parse such tagged JSON values and
474 deserialise them via a call to the "THAW" method.
475
476 If $enable is false (the default), then "encode" will not consider
477 this type of conversion, and tagged JSON values will cause a parse
478 error in "decode", as if tags were not part of the grammar.
479
480 $json = $json->filter_json_object ([$coderef->($hashref)])
481 When $coderef is specified, it will be called from "decode" each
482 time it decodes a JSON object. The only argument is a reference to
483 the newly-created hash. If the code references returns a single
484 scalar (which need not be a reference), this value (i.e. a copy of
485 that scalar to avoid aliasing) is inserted into the deserialised
486 data structure. If it returns an empty list (NOTE: not "undef",
487 which is a valid scalar), the original deserialised hash will be
488 inserted. This setting can slow down decoding considerably.
489
490 When $coderef is omitted or undefined, any existing callback will
491 be removed and "decode" will not change the deserialised hash in
492 any way.
493
494 Example, convert all JSON objects into the integer 5:
495
496 my $js = JSON::XS->new->filter_json_object (sub { 5 });
497 # returns [5]
498 $js->decode ('[{}]')
499 # throw an exception because allow_nonref is not enabled
500 # so a lone 5 is not allowed.
501 $js->decode ('{"a":1, "b":2}');
502
503 $json = $json->filter_json_single_key_object ($key [=>
504 $coderef->($value)])
505 Works remotely similar to "filter_json_object", but is only called
506 for JSON objects having a single key named $key.
507
508 This $coderef is called before the one specified via
509 "filter_json_object", if any. It gets passed the single value in
510 the JSON object. If it returns a single value, it will be inserted
511 into the data structure. If it returns nothing (not even "undef"
512 but the empty list), the callback from "filter_json_object" will be
513 called next, as if no single-key callback were specified.
514
515 If $coderef is omitted or undefined, the corresponding callback
516 will be disabled. There can only ever be one callback for a given
517 key.
518
519 As this callback gets called less often then the
520 "filter_json_object" one, decoding speed will not usually suffer as
521 much. Therefore, single-key objects make excellent targets to
522 serialise Perl objects into, especially as single-key JSON objects
523 are as close to the type-tagged value concept as JSON gets (it's
524 basically an ID/VALUE tuple). Of course, JSON does not support this
525 in any way, so you need to make sure your data never looks like a
526 serialised Perl hash.
527
528 Typical names for the single object key are "__class_whatever__",
529 or "$__dollars_are_rarely_used__$" or "}ugly_brace_placement", or
530 even things like "__class_md5sum(classname)__", to reduce the risk
531 of clashing with real hashes.
532
533 Example, decode JSON objects of the form "{ "__widget__" => <id> }"
534 into the corresponding $WIDGET{<id>} object:
535
536 # return whatever is in $WIDGET{5}:
537 JSON::XS
538 ->new
539 ->filter_json_single_key_object (__widget__ => sub {
540 $WIDGET{ $_[0] }
541 })
542 ->decode ('{"__widget__": 5')
543
544 # this can be used with a TO_JSON method in some "widget" class
545 # for serialisation to json:
546 sub WidgetBase::TO_JSON {
547 my ($self) = @_;
548
549 unless ($self->{id}) {
550 $self->{id} = ..get..some..id..;
551 $WIDGET{$self->{id}} = $self;
552 }
553
554 { __widget__ => $self->{id} }
555 }
556
557 $json = $json->shrink ([$enable])
558 $enabled = $json->get_shrink
559 Perl usually over-allocates memory a bit when allocating space for
560 strings. This flag optionally resizes strings generated by either
561 "encode" or "decode" to their minimum size possible. This can save
562 memory when your JSON texts are either very very long or you have
563 many short strings. It will also try to downgrade any strings to
564 octet-form if possible: perl stores strings internally either in an
565 encoding called UTF-X or in octet-form. The latter cannot store
566 everything but uses less space in general (and some buggy Perl or C
567 code might even rely on that internal representation being used).
568
569 The actual definition of what shrink does might change in future
570 versions, but it will always try to save space at the expense of
571 time.
572
573 If $enable is true (or missing), the string returned by "encode"
574 will be shrunk-to-fit, while all strings generated by "decode" will
575 also be shrunk-to-fit.
576
577 If $enable is false, then the normal perl allocation algorithms are
578 used. If you work with your data, then this is likely to be
579 faster.
580
581 In the future, this setting might control other things, such as
582 converting strings that look like integers or floats into integers
583 or floats internally (there is no difference on the Perl level),
584 saving space.
585
586 $json = $json->max_depth ([$maximum_nesting_depth])
587 $max_depth = $json->get_max_depth
588 Sets the maximum nesting level (default 512) accepted while
589 encoding or decoding. If a higher nesting level is detected in JSON
590 text or a Perl data structure, then the encoder and decoder will
591 stop and croak at that point.
592
593 Nesting level is defined by number of hash- or arrayrefs that the
594 encoder needs to traverse to reach a given point or the number of
595 "{" or "[" characters without their matching closing parenthesis
596 crossed to reach a given character in a string.
597
598 Setting the maximum depth to one disallows any nesting, so that
599 ensures that the object is only a single hash/object or array.
600
601 If no argument is given, the highest possible setting will be used,
602 which is rarely useful.
603
604 Note that nesting is implemented by recursion in C. The default
605 value has been chosen to be as large as typical operating systems
606 allow without crashing.
607
608 See SECURITY CONSIDERATIONS, below, for more info on why this is
609 useful.
610
611 $json = $json->max_size ([$maximum_string_size])
612 $max_size = $json->get_max_size
613 Set the maximum length a JSON text may have (in bytes) where
614 decoding is being attempted. The default is 0, meaning no limit.
615 When "decode" is called on a string that is longer then this many
616 bytes, it will not attempt to decode the string but throw an
617 exception. This setting has no effect on "encode" (yet).
618
619 If no argument is given, the limit check will be deactivated (same
620 as when 0 is specified).
621
622 See SECURITY CONSIDERATIONS, below, for more info on why this is
623 useful.
624
625 $json_text = $json->encode ($perl_scalar)
626 Converts the given Perl value or data structure to its JSON
627 representation. Croaks on error.
628
629 $perl_scalar = $json->decode ($json_text)
630 The opposite of "encode": expects a JSON text and tries to parse
631 it, returning the resulting simple scalar or reference. Croaks on
632 error.
633
634 ($perl_scalar, $characters) = $json->decode_prefix ($json_text)
635 This works like the "decode" method, but instead of raising an
636 exception when there is trailing garbage after the first JSON
637 object, it will silently stop parsing there and return the number
638 of characters consumed so far.
639
640 This is useful if your JSON texts are not delimited by an outer
641 protocol and you need to know where the JSON text ends.
642
643 JSON::XS->new->decode_prefix ("[1] the tail")
644 => ([1], 3)
645
647 In some cases, there is the need for incremental parsing of JSON texts.
648 While this module always has to keep both JSON text and resulting Perl
649 data structure in memory at one time, it does allow you to parse a JSON
650 stream incrementally. It does so by accumulating text until it has a
651 full JSON object, which it then can decode. This process is similar to
652 using "decode_prefix" to see if a full JSON object is available, but is
653 much more efficient (and can be implemented with a minimum of method
654 calls).
655
656 JSON::XS will only attempt to parse the JSON text once it is sure it
657 has enough text to get a decisive result, using a very simple but truly
658 incremental parser. This means that it sometimes won't stop as early as
659 the full parser, for example, it doesn't detect mismatched parentheses.
660 The only thing it guarantees is that it starts decoding as soon as a
661 syntactically valid JSON text has been seen. This means you need to set
662 resource limits (e.g. "max_size") to ensure the parser will stop
663 parsing in the presence if syntax errors.
664
665 The following methods implement this incremental parser.
666
667 [void, scalar or list context] = $json->incr_parse ([$string])
668 This is the central parsing function. It can both append new text
669 and extract objects from the stream accumulated so far (both of
670 these functions are optional).
671
672 If $string is given, then this string is appended to the already
673 existing JSON fragment stored in the $json object.
674
675 After that, if the function is called in void context, it will
676 simply return without doing anything further. This can be used to
677 add more text in as many chunks as you want.
678
679 If the method is called in scalar context, then it will try to
680 extract exactly one JSON object. If that is successful, it will
681 return this object, otherwise it will return "undef". If there is a
682 parse error, this method will croak just as "decode" would do (one
683 can then use "incr_skip" to skip the erroneous part). This is the
684 most common way of using the method.
685
686 And finally, in list context, it will try to extract as many
687 objects from the stream as it can find and return them, or the
688 empty list otherwise. For this to work, there must be no separators
689 (other than whitespace) between the JSON objects or arrays, instead
690 they must be concatenated back-to-back. If an error occurs, an
691 exception will be raised as in the scalar context case. Note that
692 in this case, any previously-parsed JSON texts will be lost.
693
694 Example: Parse some JSON arrays/objects in a given string and
695 return them.
696
697 my @objs = JSON::XS->new->incr_parse ("[5][7][1,2]");
698
699 $lvalue_string = $json->incr_text
700 This method returns the currently stored JSON fragment as an
701 lvalue, that is, you can manipulate it. This only works when a
702 preceding call to "incr_parse" in scalar context successfully
703 returned an object. Under all other circumstances you must not call
704 this function (I mean it. although in simple tests it might
705 actually work, it will fail under real world conditions). As a
706 special exception, you can also call this method before having
707 parsed anything.
708
709 That means you can only use this function to look at or manipulate
710 text before or after complete JSON objects, not while the parser is
711 in the middle of parsing a JSON object.
712
713 This function is useful in two cases: a) finding the trailing text
714 after a JSON object or b) parsing multiple JSON objects separated
715 by non-JSON text (such as commas).
716
717 $json->incr_skip
718 This will reset the state of the incremental parser and will remove
719 the parsed text from the input buffer so far. This is useful after
720 "incr_parse" died, in which case the input buffer and incremental
721 parser state is left unchanged, to skip the text parsed so far and
722 to reset the parse state.
723
724 The difference to "incr_reset" is that only text until the parse
725 error occurred is removed.
726
727 $json->incr_reset
728 This completely resets the incremental parser, that is, after this
729 call, it will be as if the parser had never parsed anything.
730
731 This is useful if you want to repeatedly parse JSON objects and
732 want to ignore any trailing data, which means you have to reset the
733 parser after each successful decode.
734
735 LIMITATIONS
736 All options that affect decoding are supported, except "allow_nonref".
737 The reason for this is that it cannot be made to work sensibly: JSON
738 objects and arrays are self-delimited, i.e. you can concatenate them
739 back to back and still decode them perfectly. This does not hold true
740 for JSON numbers, however.
741
742 For example, is the string 1 a single JSON number, or is it simply the
743 start of 12? Or is 12 a single JSON number, or the concatenation of 1
744 and 2? In neither case you can tell, and this is why JSON::XS takes the
745 conservative route and disallows this case.
746
747 EXAMPLES
748 Some examples will make all this clearer. First, a simple example that
749 works similarly to "decode_prefix": We want to decode the JSON object
750 at the start of a string and identify the portion after the JSON
751 object:
752
753 my $text = "[1,2,3] hello";
754
755 my $json = new JSON::XS;
756
757 my $obj = $json->incr_parse ($text)
758 or die "expected JSON object or array at beginning of string";
759
760 my $tail = $json->incr_text;
761 # $tail now contains " hello"
762
763 Easy, isn't it?
764
765 Now for a more complicated example: Imagine a hypothetical protocol
766 where you read some requests from a TCP stream, and each request is a
767 JSON array, without any separation between them (in fact, it is often
768 useful to use newlines as "separators", as these get interpreted as
769 whitespace at the start of the JSON text, which makes it possible to
770 test said protocol with "telnet"...).
771
772 Here is how you'd do it (it is trivial to write this in an event-based
773 manner):
774
775 my $json = new JSON::XS;
776
777 # read some data from the socket
778 while (sysread $socket, my $buf, 4096) {
779
780 # split and decode as many requests as possible
781 for my $request ($json->incr_parse ($buf)) {
782 # act on the $request
783 }
784 }
785
786 Another complicated example: Assume you have a string with JSON objects
787 or arrays, all separated by (optional) comma characters (e.g. "[1],[2],
788 [3]"). To parse them, we have to skip the commas between the JSON
789 texts, and here is where the lvalue-ness of "incr_text" comes in
790 useful:
791
792 my $text = "[1],[2], [3]";
793 my $json = new JSON::XS;
794
795 # void context, so no parsing done
796 $json->incr_parse ($text);
797
798 # now extract as many objects as possible. note the
799 # use of scalar context so incr_text can be called.
800 while (my $obj = $json->incr_parse) {
801 # do something with $obj
802
803 # now skip the optional comma
804 $json->incr_text =~ s/^ \s* , //x;
805 }
806
807 Now lets go for a very complex example: Assume that you have a gigantic
808 JSON array-of-objects, many gigabytes in size, and you want to parse
809 it, but you cannot load it into memory fully (this has actually
810 happened in the real world :).
811
812 Well, you lost, you have to implement your own JSON parser. But
813 JSON::XS can still help you: You implement a (very simple) array parser
814 and let JSON decode the array elements, which are all full JSON objects
815 on their own (this wouldn't work if the array elements could be JSON
816 numbers, for example):
817
818 my $json = new JSON::XS;
819
820 # open the monster
821 open my $fh, "<bigfile.json"
822 or die "bigfile: $!";
823
824 # first parse the initial "["
825 for (;;) {
826 sysread $fh, my $buf, 65536
827 or die "read error: $!";
828 $json->incr_parse ($buf); # void context, so no parsing
829
830 # Exit the loop once we found and removed(!) the initial "[".
831 # In essence, we are (ab-)using the $json object as a simple scalar
832 # we append data to.
833 last if $json->incr_text =~ s/^ \s* \[ //x;
834 }
835
836 # now we have the skipped the initial "[", so continue
837 # parsing all the elements.
838 for (;;) {
839 # in this loop we read data until we got a single JSON object
840 for (;;) {
841 if (my $obj = $json->incr_parse) {
842 # do something with $obj
843 last;
844 }
845
846 # add more data
847 sysread $fh, my $buf, 65536
848 or die "read error: $!";
849 $json->incr_parse ($buf); # void context, so no parsing
850 }
851
852 # in this loop we read data until we either found and parsed the
853 # separating "," between elements, or the final "]"
854 for (;;) {
855 # first skip whitespace
856 $json->incr_text =~ s/^\s*//;
857
858 # if we find "]", we are done
859 if ($json->incr_text =~ s/^\]//) {
860 print "finished.\n";
861 exit;
862 }
863
864 # if we find ",", we can continue with the next element
865 if ($json->incr_text =~ s/^,//) {
866 last;
867 }
868
869 # if we find anything else, we have a parse error!
870 if (length $json->incr_text) {
871 die "parse error near ", $json->incr_text;
872 }
873
874 # else add more data
875 sysread $fh, my $buf, 65536
876 or die "read error: $!";
877 $json->incr_parse ($buf); # void context, so no parsing
878 }
879
880 This is a complex example, but most of the complexity comes from the
881 fact that we are trying to be correct (bear with me if I am wrong, I
882 never ran the above example :).
883
885 This section describes how JSON::XS maps Perl values to JSON values and
886 vice versa. These mappings are designed to "do the right thing" in most
887 circumstances automatically, preserving round-tripping characteristics
888 (what you put in comes out as something equivalent).
889
890 For the more enlightened: note that in the following descriptions,
891 lowercase perl refers to the Perl interpreter, while uppercase Perl
892 refers to the abstract Perl language itself.
893
894 JSON -> PERL
895 object
896 A JSON object becomes a reference to a hash in Perl. No ordering of
897 object keys is preserved (JSON does not preserve object key
898 ordering itself).
899
900 array
901 A JSON array becomes a reference to an array in Perl.
902
903 string
904 A JSON string becomes a string scalar in Perl - Unicode codepoints
905 in JSON are represented by the same codepoints in the Perl string,
906 so no manual decoding is necessary.
907
908 number
909 A JSON number becomes either an integer, numeric (floating point)
910 or string scalar in perl, depending on its range and any fractional
911 parts. On the Perl level, there is no difference between those as
912 Perl handles all the conversion details, but an integer may take
913 slightly less memory and might represent more values exactly than
914 floating point numbers.
915
916 If the number consists of digits only, JSON::XS will try to
917 represent it as an integer value. If that fails, it will try to
918 represent it as a numeric (floating point) value if that is
919 possible without loss of precision. Otherwise it will preserve the
920 number as a string value (in which case you lose roundtripping
921 ability, as the JSON number will be re-encoded to a JSON string).
922
923 Numbers containing a fractional or exponential part will always be
924 represented as numeric (floating point) values, possibly at a loss
925 of precision (in which case you might lose perfect roundtripping
926 ability, but the JSON number will still be re-encoded as a JSON
927 number).
928
929 Note that precision is not accuracy - binary floating point values
930 cannot represent most decimal fractions exactly, and when
931 converting from and to floating point, JSON::XS only guarantees
932 precision up to but not including the least significant bit.
933
934 true, false
935 These JSON atoms become "Types::Serialiser::true" and
936 "Types::Serialiser::false", respectively. They are overloaded to
937 act almost exactly like the numbers 1 and 0. You can check whether
938 a scalar is a JSON boolean by using the
939 "Types::Serialiser::is_bool" function (after "use
940 Types::Serialier", of course).
941
942 null
943 A JSON null atom becomes "undef" in Perl.
944
945 shell-style comments ("# text")
946 As a nonstandard extension to the JSON syntax that is enabled by
947 the "relaxed" setting, shell-style comments are allowed. They can
948 start anywhere outside strings and go till the end of the line.
949
950 tagged values ("(tag)value").
951 Another nonstandard extension to the JSON syntax, enabled with the
952 "allow_tags" setting, are tagged values. In this implementation,
953 the tag must be a perl package/class name encoded as a JSON string,
954 and the value must be a JSON array encoding optional constructor
955 arguments.
956
957 See "OBJECT SERIALISATION", below, for details.
958
959 PERL -> JSON
960 The mapping from Perl to JSON is slightly more difficult, as Perl is a
961 truly typeless language, so we can only guess which JSON type is meant
962 by a Perl value.
963
964 hash references
965 Perl hash references become JSON objects. As there is no inherent
966 ordering in hash keys (or JSON objects), they will usually be
967 encoded in a pseudo-random order. JSON::XS can optionally sort the
968 hash keys (determined by the canonical flag), so the same
969 datastructure will serialise to the same JSON text (given same
970 settings and version of JSON::XS), but this incurs a runtime
971 overhead and is only rarely useful, e.g. when you want to compare
972 some JSON text against another for equality.
973
974 array references
975 Perl array references become JSON arrays.
976
977 other references
978 Other unblessed references are generally not allowed and will cause
979 an exception to be thrown, except for references to the integers 0
980 and 1, which get turned into "false" and "true" atoms in JSON.
981
982 Since "JSON::XS" uses the boolean model from Types::Serialiser, you
983 can also "use Types::Serialiser" and then use
984 "Types::Serialiser::false" and "Types::Serialiser::true" to improve
985 readability.
986
987 use Types::Serialiser;
988 encode_json [\0, Types::Serialiser::true] # yields [false,true]
989
990 Types::Serialiser::true, Types::Serialiser::false
991 These special values from the Types::Serialiser module become JSON
992 true and JSON false values, respectively. You can also use "\1" and
993 "\0" directly if you want.
994
995 blessed objects
996 Blessed objects are not directly representable in JSON, but
997 "JSON::XS" allows various ways of handling objects. See "OBJECT
998 SERIALISATION", below, for details.
999
1000 simple scalars
1001 Simple Perl scalars (any scalar that is not a reference) are the
1002 most difficult objects to encode: JSON::XS will encode undefined
1003 scalars as JSON "null" values, scalars that have last been used in
1004 a string context before encoding as JSON strings, and anything else
1005 as number value:
1006
1007 # dump as number
1008 encode_json [2] # yields [2]
1009 encode_json [-3.0e17] # yields [-3e+17]
1010 my $value = 5; encode_json [$value] # yields [5]
1011
1012 # used as string, so dump as string
1013 print $value;
1014 encode_json [$value] # yields ["5"]
1015
1016 # undef becomes null
1017 encode_json [undef] # yields [null]
1018
1019 You can force the type to be a JSON string by stringifying it:
1020
1021 my $x = 3.1; # some variable containing a number
1022 "$x"; # stringified
1023 $x .= ""; # another, more awkward way to stringify
1024 print $x; # perl does it for you, too, quite often
1025
1026 You can force the type to be a JSON number by numifying it:
1027
1028 my $x = "3"; # some variable containing a string
1029 $x += 0; # numify it, ensuring it will be dumped as a number
1030 $x *= 1; # same thing, the choice is yours.
1031
1032 You can not currently force the type in other, less obscure, ways.
1033 Tell me if you need this capability (but don't forget to explain
1034 why it's needed :).
1035
1036 Note that numerical precision has the same meaning as under Perl
1037 (so binary to decimal conversion follows the same rules as in Perl,
1038 which can differ to other languages). Also, your perl interpreter
1039 might expose extensions to the floating point numbers of your
1040 platform, such as infinities or NaN's - these cannot be represented
1041 in JSON, and it is an error to pass those in.
1042
1043 OBJECT SERIALISATION
1044 As JSON cannot directly represent Perl objects, you have to choose
1045 between a pure JSON representation (without the ability to deserialise
1046 the object automatically again), and a nonstandard extension to the
1047 JSON syntax, tagged values.
1048
1049 SERIALISATION
1050
1051 What happens when "JSON::XS" encounters a Perl object depends on the
1052 "allow_blessed", "convert_blessed" and "allow_tags" settings, which are
1053 used in this order:
1054
1055 1. "allow_tags" is enabled and the object has a "FREEZE" method.
1056 In this case, "JSON::XS" uses the Types::Serialiser object
1057 serialisation protocol to create a tagged JSON value, using a
1058 nonstandard extension to the JSON syntax.
1059
1060 This works by invoking the "FREEZE" method on the object, with the
1061 first argument being the object to serialise, and the second
1062 argument being the constant string "JSON" to distinguish it from
1063 other serialisers.
1064
1065 The "FREEZE" method can return any number of values (i.e. zero or
1066 more). These values and the paclkage/classname of the object will
1067 then be encoded as a tagged JSON value in the following format:
1068
1069 ("classname")[FREEZE return values...]
1070
1071 e.g.:
1072
1073 ("URI")["http://www.google.com/"]
1074 ("MyDate")[2013,10,29]
1075 ("ImageData::JPEG")["Z3...VlCg=="]
1076
1077 For example, the hypothetical "My::Object" "FREEZE" method might
1078 use the objects "type" and "id" members to encode the object:
1079
1080 sub My::Object::FREEZE {
1081 my ($self, $serialiser) = @_;
1082
1083 ($self->{type}, $self->{id})
1084 }
1085
1086 2. "convert_blessed" is enabled and the object has a "TO_JSON" method.
1087 In this case, the "TO_JSON" method of the object is invoked in
1088 scalar context. It must return a single scalar that can be directly
1089 encoded into JSON. This scalar replaces the object in the JSON
1090 text.
1091
1092 For example, the following "TO_JSON" method will convert all URI
1093 objects to JSON strings when serialised. The fatc that these values
1094 originally were URI objects is lost.
1095
1096 sub URI::TO_JSON {
1097 my ($uri) = @_;
1098 $uri->as_string
1099 }
1100
1101 3. "allow_blessed" is enabled.
1102 The object will be serialised as a JSON null value.
1103
1104 4. none of the above
1105 If none of the settings are enabled or the respective methods are
1106 missing, "JSON::XS" throws an exception.
1107
1108 DESERIALISATION
1109
1110 For deserialisation there are only two cases to consider: either
1111 nonstandard tagging was used, in which case "allow_tags" decides, or
1112 objects cannot be automatically be deserialised, in which case you can
1113 use postprocessing or the "filter_json_object" or
1114 "filter_json_single_key_object" callbacks to get some real objects our
1115 of your JSON.
1116
1117 This section only considers the tagged value case: I a tagged JSON
1118 object is encountered during decoding and "allow_tags" is disabled, a
1119 parse error will result (as if tagged values were not part of the
1120 grammar).
1121
1122 If "allow_tags" is enabled, "JSON::XS" will look up the "THAW" method
1123 of the package/classname used during serialisation (it will not attempt
1124 to load the package as a Perl module). If there is no such method, the
1125 decoding will fail with an error.
1126
1127 Otherwise, the "THAW" method is invoked with the classname as first
1128 argument, the constant string "JSON" as second argument, and all the
1129 values from the JSON array (the values originally returned by the
1130 "FREEZE" method) as remaining arguments.
1131
1132 The method must then return the object. While technically you can
1133 return any Perl scalar, you might have to enable the "enable_nonref"
1134 setting to make that work in all cases, so better return an actual
1135 blessed reference.
1136
1137 As an example, let's implement a "THAW" function that regenerates the
1138 "My::Object" from the "FREEZE" example earlier:
1139
1140 sub My::Object::THAW {
1141 my ($class, $serialiser, $type, $id) = @_;
1142
1143 $class->new (type => $type, id => $id)
1144 }
1145
1147 The interested reader might have seen a number of flags that signify
1148 encodings or codesets - "utf8", "latin1" and "ascii". There seems to be
1149 some confusion on what these do, so here is a short comparison:
1150
1151 "utf8" controls whether the JSON text created by "encode" (and expected
1152 by "decode") is UTF-8 encoded or not, while "latin1" and "ascii" only
1153 control whether "encode" escapes character values outside their
1154 respective codeset range. Neither of these flags conflict with each
1155 other, although some combinations make less sense than others.
1156
1157 Care has been taken to make all flags symmetrical with respect to
1158 "encode" and "decode", that is, texts encoded with any combination of
1159 these flag values will be correctly decoded when the same flags are
1160 used - in general, if you use different flag settings while encoding
1161 vs. when decoding you likely have a bug somewhere.
1162
1163 Below comes a verbose discussion of these flags. Note that a "codeset"
1164 is simply an abstract set of character-codepoint pairs, while an
1165 encoding takes those codepoint numbers and encodes them, in our case
1166 into octets. Unicode is (among other things) a codeset, UTF-8 is an
1167 encoding, and ISO-8859-1 (= latin 1) and ASCII are both codesets and
1168 encodings at the same time, which can be confusing.
1169
1170 "utf8" flag disabled
1171 When "utf8" is disabled (the default), then "encode"/"decode"
1172 generate and expect Unicode strings, that is, characters with high
1173 ordinal Unicode values (> 255) will be encoded as such characters,
1174 and likewise such characters are decoded as-is, no changes to them
1175 will be done, except "(re-)interpreting" them as Unicode codepoints
1176 or Unicode characters, respectively (to Perl, these are the same
1177 thing in strings unless you do funny/weird/dumb stuff).
1178
1179 This is useful when you want to do the encoding yourself (e.g. when
1180 you want to have UTF-16 encoded JSON texts) or when some other
1181 layer does the encoding for you (for example, when printing to a
1182 terminal using a filehandle that transparently encodes to UTF-8 you
1183 certainly do NOT want to UTF-8 encode your data first and have Perl
1184 encode it another time).
1185
1186 "utf8" flag enabled
1187 If the "utf8"-flag is enabled, "encode"/"decode" will encode all
1188 characters using the corresponding UTF-8 multi-byte sequence, and
1189 will expect your input strings to be encoded as UTF-8, that is, no
1190 "character" of the input string must have any value > 255, as UTF-8
1191 does not allow that.
1192
1193 The "utf8" flag therefore switches between two modes: disabled
1194 means you will get a Unicode string in Perl, enabled means you get
1195 an UTF-8 encoded octet/binary string in Perl.
1196
1197 "latin1" or "ascii" flags enabled
1198 With "latin1" (or "ascii") enabled, "encode" will escape characters
1199 with ordinal values > 255 (> 127 with "ascii") and encode the
1200 remaining characters as specified by the "utf8" flag.
1201
1202 If "utf8" is disabled, then the result is also correctly encoded in
1203 those character sets (as both are proper subsets of Unicode,
1204 meaning that a Unicode string with all character values < 256 is
1205 the same thing as a ISO-8859-1 string, and a Unicode string with
1206 all character values < 128 is the same thing as an ASCII string in
1207 Perl).
1208
1209 If "utf8" is enabled, you still get a correct UTF-8-encoded string,
1210 regardless of these flags, just some more characters will be
1211 escaped using "\uXXXX" then before.
1212
1213 Note that ISO-8859-1-encoded strings are not compatible with UTF-8
1214 encoding, while ASCII-encoded strings are. That is because the
1215 ISO-8859-1 encoding is NOT a subset of UTF-8 (despite the
1216 ISO-8859-1 codeset being a subset of Unicode), while ASCII is.
1217
1218 Surprisingly, "decode" will ignore these flags and so treat all
1219 input values as governed by the "utf8" flag. If it is disabled,
1220 this allows you to decode ISO-8859-1- and ASCII-encoded strings, as
1221 both strict subsets of Unicode. If it is enabled, you can correctly
1222 decode UTF-8 encoded strings.
1223
1224 So neither "latin1" nor "ascii" are incompatible with the "utf8"
1225 flag - they only govern when the JSON output engine escapes a
1226 character or not.
1227
1228 The main use for "latin1" is to relatively efficiently store binary
1229 data as JSON, at the expense of breaking compatibility with most
1230 JSON decoders.
1231
1232 The main use for "ascii" is to force the output to not contain
1233 characters with values > 127, which means you can interpret the
1234 resulting string as UTF-8, ISO-8859-1, ASCII, KOI8-R or most about
1235 any character set and 8-bit-encoding, and still get the same data
1236 structure back. This is useful when your channel for JSON transfer
1237 is not 8-bit clean or the encoding might be mangled in between
1238 (e.g. in mail), and works because ASCII is a proper subset of most
1239 8-bit and multibyte encodings in use in the world.
1240
1241 JSON and ECMAscript
1242 JSON syntax is based on how literals are represented in javascript (the
1243 not-standardised predecessor of ECMAscript) which is presumably why it
1244 is called "JavaScript Object Notation".
1245
1246 However, JSON is not a subset (and also not a superset of course) of
1247 ECMAscript (the standard) or javascript (whatever browsers actually
1248 implement).
1249
1250 If you want to use javascript's "eval" function to "parse" JSON, you
1251 might run into parse errors for valid JSON texts, or the resulting data
1252 structure might not be queryable:
1253
1254 One of the problems is that U+2028 and U+2029 are valid characters
1255 inside JSON strings, but are not allowed in ECMAscript string literals,
1256 so the following Perl fragment will not output something that can be
1257 guaranteed to be parsable by javascript's "eval":
1258
1259 use JSON::XS;
1260
1261 print encode_json [chr 0x2028];
1262
1263 The right fix for this is to use a proper JSON parser in your
1264 javascript programs, and not rely on "eval" (see for example Douglas
1265 Crockford's json2.js parser).
1266
1267 If this is not an option, you can, as a stop-gap measure, simply encode
1268 to ASCII-only JSON:
1269
1270 use JSON::XS;
1271
1272 print JSON::XS->new->ascii->encode ([chr 0x2028]);
1273
1274 Note that this will enlarge the resulting JSON text quite a bit if you
1275 have many non-ASCII characters. You might be tempted to run some
1276 regexes to only escape U+2028 and U+2029, e.g.:
1277
1278 # DO NOT USE THIS!
1279 my $json = JSON::XS->new->utf8->encode ([chr 0x2028]);
1280 $json =~ s/\xe2\x80\xa8/\\u2028/g; # escape U+2028
1281 $json =~ s/\xe2\x80\xa9/\\u2029/g; # escape U+2029
1282 print $json;
1283
1284 Note that this is a bad idea: the above only works for U+2028 and
1285 U+2029 and thus only for fully ECMAscript-compliant parsers. Many
1286 existing javascript implementations, however, have issues with other
1287 characters as well - using "eval" naively simply will cause problems.
1288
1289 Another problem is that some javascript implementations reserve some
1290 property names for their own purposes (which probably makes them non-
1291 ECMAscript-compliant). For example, Iceweasel reserves the "__proto__"
1292 property name for its own purposes.
1293
1294 If that is a problem, you could parse try to filter the resulting JSON
1295 output for these property strings, e.g.:
1296
1297 $json =~ s/"__proto__"\s*:/"__proto__renamed":/g;
1298
1299 This works because "__proto__" is not valid outside of strings, so
1300 every occurrence of ""__proto__"\s*:" must be a string used as property
1301 name.
1302
1303 If you know of other incompatibilities, please let me know.
1304
1305 JSON and YAML
1306 You often hear that JSON is a subset of YAML. This is, however, a mass
1307 hysteria(*) and very far from the truth (as of the time of this
1308 writing), so let me state it clearly: in general, there is no way to
1309 configure JSON::XS to output a data structure as valid YAML that works
1310 in all cases.
1311
1312 If you really must use JSON::XS to generate YAML, you should use this
1313 algorithm (subject to change in future versions):
1314
1315 my $to_yaml = JSON::XS->new->utf8->space_after (1);
1316 my $yaml = $to_yaml->encode ($ref) . "\n";
1317
1318 This will usually generate JSON texts that also parse as valid YAML.
1319 Please note that YAML has hardcoded limits on (simple) object key
1320 lengths that JSON doesn't have and also has different and incompatible
1321 unicode character escape syntax, so you should make sure that your hash
1322 keys are noticeably shorter than the 1024 "stream characters" YAML
1323 allows and that you do not have characters with codepoint values
1324 outside the Unicode BMP (basic multilingual page). YAML also does not
1325 allow "\/" sequences in strings (which JSON::XS does not currently
1326 generate, but other JSON generators might).
1327
1328 There might be other incompatibilities that I am not aware of (or the
1329 YAML specification has been changed yet again - it does so quite
1330 often). In general you should not try to generate YAML with a JSON
1331 generator or vice versa, or try to parse JSON with a YAML parser or
1332 vice versa: chances are high that you will run into severe
1333 interoperability problems when you least expect it.
1334
1335 (*) I have been pressured multiple times by Brian Ingerson (one of the
1336 authors of the YAML specification) to remove this paragraph,
1337 despite him acknowledging that the actual incompatibilities exist.
1338 As I was personally bitten by this "JSON is YAML" lie, I refused
1339 and said I will continue to educate people about these issues, so
1340 others do not run into the same problem again and again. After
1341 this, Brian called me a (quote)complete and worthless
1342 idiot(unquote).
1343
1344 In my opinion, instead of pressuring and insulting people who
1345 actually clarify issues with YAML and the wrong statements of some
1346 of its proponents, I would kindly suggest reading the JSON spec
1347 (which is not that difficult or long) and finally make YAML
1348 compatible to it, and educating users about the changes, instead of
1349 spreading lies about the real compatibility for many years and
1350 trying to silence people who point out that it isn't true.
1351
1352 Addendum/2009: the YAML 1.2 spec is still incompatible with JSON,
1353 even though the incompatibilities have been documented (and are
1354 known to Brian) for many years and the spec makes explicit claims
1355 that YAML is a superset of JSON. It would be so easy to fix, but
1356 apparently, bullying people and corrupting userdata is so much
1357 easier.
1358
1359 SPEED
1360 It seems that JSON::XS is surprisingly fast, as shown in the following
1361 tables. They have been generated with the help of the "eg/bench"
1362 program in the JSON::XS distribution, to make it easy to compare on
1363 your own system.
1364
1365 First comes a comparison between various modules using a very short
1366 single-line JSON string (also available at
1367 <http://dist.schmorp.de/misc/json/short.json>).
1368
1369 {"method": "handleMessage", "params": ["user1",
1370 "we were just talking"], "id": null, "array":[1,11,234,-5,1e5,1e7,
1371 1, 0]}
1372
1373 It shows the number of encodes/decodes per second (JSON::XS uses the
1374 functional interface, while JSON::XS/2 uses the OO interface with
1375 pretty-printing and hashkey sorting enabled, JSON::XS/3 enables shrink.
1376 JSON::DWIW/DS uses the deserialise function, while JSON::DWIW::FJ uses
1377 the from_json method). Higher is better:
1378
1379 module | encode | decode |
1380 --------------|------------|------------|
1381 JSON::DWIW/DS | 86302.551 | 102300.098 |
1382 JSON::DWIW/FJ | 86302.551 | 75983.768 |
1383 JSON::PP | 15827.562 | 6638.658 |
1384 JSON::Syck | 63358.066 | 47662.545 |
1385 JSON::XS | 511500.488 | 511500.488 |
1386 JSON::XS/2 | 291271.111 | 388361.481 |
1387 JSON::XS/3 | 361577.931 | 361577.931 |
1388 Storable | 66788.280 | 265462.278 |
1389 --------------+------------+------------+
1390
1391 That is, JSON::XS is almost six times faster than JSON::DWIW on
1392 encoding, about five times faster on decoding, and over thirty to
1393 seventy times faster than JSON's pure perl implementation. It also
1394 compares favourably to Storable for small amounts of data.
1395
1396 Using a longer test string (roughly 18KB, generated from Yahoo! Locals
1397 search API (<http://dist.schmorp.de/misc/json/long.json>).
1398
1399 module | encode | decode |
1400 --------------|------------|------------|
1401 JSON::DWIW/DS | 1647.927 | 2673.916 |
1402 JSON::DWIW/FJ | 1630.249 | 2596.128 |
1403 JSON::PP | 400.640 | 62.311 |
1404 JSON::Syck | 1481.040 | 1524.869 |
1405 JSON::XS | 20661.596 | 9541.183 |
1406 JSON::XS/2 | 10683.403 | 9416.938 |
1407 JSON::XS/3 | 20661.596 | 9400.054 |
1408 Storable | 19765.806 | 10000.725 |
1409 --------------+------------+------------+
1410
1411 Again, JSON::XS leads by far (except for Storable which non-
1412 surprisingly decodes a bit faster).
1413
1414 On large strings containing lots of high Unicode characters, some
1415 modules (such as JSON::PC) seem to decode faster than JSON::XS, but the
1416 result will be broken due to missing (or wrong) Unicode handling.
1417 Others refuse to decode or encode properly, so it was impossible to
1418 prepare a fair comparison table for that case.
1419
1421 When you are using JSON in a protocol, talking to untrusted potentially
1422 hostile creatures requires relatively few measures.
1423
1424 First of all, your JSON decoder should be secure, that is, should not
1425 have any buffer overflows. Obviously, this module should ensure that
1426 and I am trying hard on making that true, but you never know.
1427
1428 Second, you need to avoid resource-starving attacks. That means you
1429 should limit the size of JSON texts you accept, or make sure then when
1430 your resources run out, that's just fine (e.g. by using a separate
1431 process that can crash safely). The size of a JSON text in octets or
1432 characters is usually a good indication of the size of the resources
1433 required to decode it into a Perl structure. While JSON::XS can check
1434 the size of the JSON text, it might be too late when you already have
1435 it in memory, so you might want to check the size before you accept the
1436 string.
1437
1438 Third, JSON::XS recurses using the C stack when decoding objects and
1439 arrays. The C stack is a limited resource: for instance, on my amd64
1440 machine with 8MB of stack size I can decode around 180k nested arrays
1441 but only 14k nested JSON objects (due to perl itself recursing deeply
1442 on croak to free the temporary). If that is exceeded, the program
1443 crashes. To be conservative, the default nesting limit is set to 512.
1444 If your process has a smaller stack, you should adjust this setting
1445 accordingly with the "max_depth" method.
1446
1447 Something else could bomb you, too, that I forgot to think of. In that
1448 case, you get to keep the pieces. I am always open for hints, though...
1449
1450 Also keep in mind that JSON::XS might leak contents of your Perl data
1451 structures in its error messages, so when you serialise sensitive
1452 information you might want to make sure that exceptions thrown by
1453 JSON::XS will not end up in front of untrusted eyes.
1454
1455 If you are using JSON::XS to return packets to consumption by
1456 JavaScript scripts in a browser you should have a look at
1457 <http://blog.archive.jpsykes.com/47/practical-csrf-and-json-security/>
1458 to see whether you are vulnerable to some common attack vectors (which
1459 really are browser design bugs, but it is still you who will have to
1460 deal with it, as major browser developers care only for features, not
1461 about getting security right).
1462
1464 TL;DR: Due to security concerns, JSON::XS will not allow scalar data in
1465 JSON texts by default - you need to create your own JSON::XS object and
1466 enable "allow_nonref":
1467
1468 my $json = JSON::XS->new->allow_nonref;
1469
1470 $text = $json->encode ($data);
1471 $data = $json->decode ($text);
1472
1473 The long version: JSON being an important and supposedly stable format,
1474 the IETF standardised it as RFC 4627 in 2006. Unfortunately, the
1475 inventor of JSON, Dougles Crockford, unilaterally changed the
1476 definition of JSON in javascript. Rather than create a fork, the IETF
1477 decided to standardise the new syntax (apparently, so Iw as told,
1478 without finding it very amusing).
1479
1480 The biggest difference between thed original JSON and the new JSON is
1481 that the new JSON supports scalars (anything other than arrays and
1482 objects) at the toplevel of a JSON text. While this is strictly
1483 backwards compatible to older versions, it breaks a number of protocols
1484 that relied on sending JSON back-to-back, and is a minor security
1485 concern.
1486
1487 For example, imagine you have two banks communicating, and on one side,
1488 trhe JSON coder gets upgraded. Two messages, such as 10 and 1000 might
1489 then be confused to mean 101000, something that couldn't happen in the
1490 original JSON, because niether of these messages would be valid JSON.
1491
1492 If one side accepts these messages, then an upgrade in the coder on
1493 either side could result in this becoming exploitable.
1494
1495 This module has always allowed these messages as an optional extension,
1496 by default disabled. The security concerns are the reason why the
1497 default is still disabled, but future versions might/will likely
1498 upgrade to the newer RFC as default format, so you are advised to check
1499 your implementation and/or override the default with "->allow_nonref
1500 (0)" to ensure that future versions are safe.
1501
1503 "JSON::XS" uses the Types::Serialiser module to provide boolean
1504 constants. That means that the JSON true and false values will be
1505 comaptible to true and false values of other modules that do the same,
1506 such as JSON::PP and CBOR::XS.
1507
1509 As long as you only serialise data that can be directly expressed in
1510 JSON, "JSON::XS" is incapable of generating invalid JSON output (modulo
1511 bugs, but "JSON::XS" has found more bugs in the official JSON testsuite
1512 (1) than the official JSON testsuite has found in "JSON::XS" (0)).
1513
1514 When you have trouble decoding JSON generated by this module using
1515 other decoders, then it is very likely that you have an encoding
1516 mismatch or the other decoder is broken.
1517
1518 When decoding, "JSON::XS" is strict by default and will likely catch
1519 all errors. There are currently two settings that change this:
1520 "relaxed" makes "JSON::XS" accept (but not generate) some non-standard
1521 extensions, and "allow_tags" will allow you to encode and decode Perl
1522 objects, at the cost of not outputting valid JSON anymore.
1523
1524 TAGGED VALUE SYNTAX AND STANDARD JSON EN/DECODERS
1525 When you use "allow_tags" to use the extended (and also nonstandard and
1526 invalid) JSON syntax for serialised objects, and you still want to
1527 decode the generated When you want to serialise objects, you can run a
1528 regex to replace the tagged syntax by standard JSON arrays (it only
1529 works for "normal" package names without comma, newlines or single
1530 colons). First, the readable Perl version:
1531
1532 # if your FREEZE methods return no values, you need this replace first:
1533 $json =~ s/\( \s* (" (?: [^\\":,]+|\\.|::)* ") \s* \) \s* \[\s*\]/[$1]/gx;
1534
1535 # this works for non-empty constructor arg lists:
1536 $json =~ s/\( \s* (" (?: [^\\":,]+|\\.|::)* ") \s* \) \s* \[/[$1,/gx;
1537
1538 And here is a less readable version that is easy to adapt to other
1539 languages:
1540
1541 $json =~ s/\(\s*("([^\\":,]+|\\.|::)*")\s*\)\s*\[/[$1,/g;
1542
1543 Here is an ECMAScript version (same regex):
1544
1545 json = json.replace (/\(\s*("([^\\":,]+|\\.|::)*")\s*\)\s*\[/g, "[$1,");
1546
1547 Since this syntax converts to standard JSON arrays, it might be hard to
1548 distinguish serialised objects from normal arrays. You can prepend a
1549 "magic number" as first array element to reduce chances of a collision:
1550
1551 $json =~ s/\(\s*("([^\\":,]+|\\.|::)*")\s*\)\s*\[/["XU1peReLzT4ggEllLanBYq4G9VzliwKF",$1,/g;
1552
1553 And after decoding the JSON text, you could walk the data structure
1554 looking for arrays with a first element of
1555 "XU1peReLzT4ggEllLanBYq4G9VzliwKF".
1556
1557 The same approach can be used to create the tagged format with another
1558 encoder. First, you create an array with the magic string as first
1559 member, the classname as second, and constructor arguments last, encode
1560 it as part of your JSON structure, and then:
1561
1562 $json =~ s/\[\s*"XU1peReLzT4ggEllLanBYq4G9VzliwKF"\s*,\s*("([^\\":,]+|\\.|::)*")\s*,/($1)[/g;
1563
1564 Again, this has some limitations - the magic string must not be encoded
1565 with character escapes, and the constructor arguments must be non-
1566 empty.
1567
1569 Since this module was written, Google has written a new JSON RFC, RFC
1570 7159 (and RFC7158). Unfortunately, this RFC breaks compatibility with
1571 both the original JSON specification on www.json.org and RFC4627.
1572
1573 As far as I can see, you can get partial compatibility when parsing by
1574 using "->allow_nonref". However, consider the security implications of
1575 doing so.
1576
1577 I haven't decided yet when to break compatibility with RFC4627 by
1578 default (and potentially leave applications insecure) and change the
1579 default to follow RFC7159, but application authors are well advised to
1580 call "->allow_nonref(0)" even if this is the current default, if they
1581 cannot handle non-reference values, in preparation for the day when the
1582 default will change.
1583
1585 This module is not guaranteed to be ithread (or MULTIPLICITY-) safe and
1586 there are no plans to change this. Note that perl's builtin so-called
1587 theeads/ithreads are officially deprecated and should not be used.
1588
1590 Sometimes people avoid the Perl locale support and directly call the
1591 system's setlocale function with "LC_ALL".
1592
1593 This breaks both perl and modules such as JSON::XS, as stringification
1594 of numbers no longer works correctly (e.g. "$x = 0.1; print "$x"+1"
1595 might print 1, and JSON::XS might output illegal JSON as JSON::XS
1596 relies on perl to stringify numbers).
1597
1598 The solution is simple: don't call "setlocale", or use it for only
1599 those categories you need, such as "LC_MESSAGES" or "LC_CTYPE".
1600
1601 If you need "LC_NUMERIC", you should enable it only around the code
1602 that actually needs it (avoiding stringification of numbers), and
1603 restore it afterwards.
1604
1606 While the goal of this module is to be correct, that unfortunately does
1607 not mean it's bug-free, only that I think its design is bug-free. If
1608 you keep reporting bugs they will be fixed swiftly, though.
1609
1610 Please refrain from using rt.cpan.org or any other bug reporting
1611 service. I put the contact address into my modules for a reason.
1612
1614 The json_xs command line utility for quick experiments.
1615
1617 Marc Lehmann <schmorp@schmorp.de>
1618 http://home.schmorp.de/
1619
1620
1621
1622perl v5.28.0 2017-08-17 XS(3)