1XS(3) User Contributed Perl Documentation XS(3)
2
3
4
6 JSON::XS - JSON serialising/deserialising, done correctly and fast
7
8 JSON::XS - 正しくて高速な JSON シリアライザ/デシリアライザ
9 (http://fleur.hio.jp/perldoc/mix/lib/JSON/XS.html)
10
12 use JSON::XS;
13
14 # exported functions, they croak on error
15 # and expect/generate UTF-8
16
17 $utf8_encoded_json_text = encode_json $perl_hash_or_arrayref;
18 $perl_hash_or_arrayref = decode_json $utf8_encoded_json_text;
19
20 # OO-interface
21
22 $coder = JSON::XS->new->ascii->pretty->allow_nonref;
23 $pretty_printed_unencoded = $coder->encode ($perl_scalar);
24 $perl_scalar = $coder->decode ($unicode_json_text);
25
26 # Note that JSON version 2.0 and above will automatically use JSON::XS
27 # if available, at virtually no speed overhead either, so you should
28 # be able to just:
29
30 use JSON;
31
32 # and do the same things, except that you have a pure-perl fallback now.
33
35 This module converts Perl data structures to JSON and vice versa. Its
36 primary goal is to be correct and its secondary goal is to be fast. To
37 reach the latter goal it was written in C.
38
39 See MAPPING, below, on how JSON::XS maps perl values to JSON values and
40 vice versa.
41
42 FEATURES
43 • correct Unicode handling
44
45 This module knows how to handle Unicode, documents how and when it
46 does so, and even documents what "correct" means.
47
48 • round-trip integrity
49
50 When you serialise a perl data structure using only data types
51 supported by JSON and Perl, the deserialised data structure is
52 identical on the Perl level. (e.g. the string "2.0" doesn't
53 suddenly become "2" just because it looks like a number). There are
54 minor exceptions to this, read the MAPPING section below to learn
55 about those.
56
57 • strict checking of JSON correctness
58
59 There is no guessing, no generating of illegal JSON texts by
60 default, and only JSON is accepted as input by default (the latter
61 is a security feature).
62
63 • fast
64
65 Compared to other JSON modules and other serialisers such as
66 Storable, this module usually compares favourably in terms of
67 speed, too.
68
69 • simple to use
70
71 This module has both a simple functional interface as well as an
72 object oriented interface.
73
74 • reasonably versatile output formats
75
76 You can choose between the most compact guaranteed-single-line
77 format possible (nice for simple line-based protocols), a pure-
78 ASCII format (for when your transport is not 8-bit clean, still
79 supports the whole Unicode range), or a pretty-printed format (for
80 when you want to read that stuff). Or you can combine those
81 features in whatever way you like.
82
84 The following convenience methods are provided by this module. They are
85 exported by default:
86
87 $json_text = encode_json $perl_scalar
88 Converts the given Perl data structure to a UTF-8 encoded, binary
89 string (that is, the string contains octets only). Croaks on error.
90
91 This function call is functionally identical to:
92
93 $json_text = JSON::XS->new->utf8->encode ($perl_scalar)
94
95 Except being faster.
96
97 $perl_scalar = decode_json $json_text
98 The opposite of "encode_json": expects a UTF-8 (binary) string and
99 tries to parse that as a UTF-8 encoded JSON text, returning the
100 resulting reference. Croaks on error.
101
102 This function call is functionally identical to:
103
104 $perl_scalar = JSON::XS->new->utf8->decode ($json_text)
105
106 Except being faster.
107
109 Since this often leads to confusion, here are a few very clear words on
110 how Unicode works in Perl, modulo bugs.
111
112 1. Perl strings can store characters with ordinal values > 255.
113 This enables you to store Unicode characters as single characters
114 in a Perl string - very natural.
115
116 2. Perl does not associate an encoding with your strings.
117 ... until you force it to, e.g. when matching it against a regex,
118 or printing the scalar to a file, in which case Perl either
119 interprets your string as locale-encoded text, octets/binary, or as
120 Unicode, depending on various settings. In no case is an encoding
121 stored together with your data, it is use that decides encoding,
122 not any magical meta data.
123
124 3. The internal utf-8 flag has no meaning with regards to the encoding
125 of your string.
126 Just ignore that flag unless you debug a Perl bug, a module written
127 in XS or want to dive into the internals of perl. Otherwise it will
128 only confuse you, as, despite the name, it says nothing about how
129 your string is encoded. You can have Unicode strings with that flag
130 set, with that flag clear, and you can have binary data with that
131 flag set and that flag clear. Other possibilities exist, too.
132
133 If you didn't know about that flag, just the better, pretend it
134 doesn't exist.
135
136 4. A "Unicode String" is simply a string where each character can be
137 validly interpreted as a Unicode code point.
138 If you have UTF-8 encoded data, it is no longer a Unicode string,
139 but a Unicode string encoded in UTF-8, giving you a binary string.
140
141 5. A string containing "high" (> 255) character values is not a UTF-8
142 string.
143 It's a fact. Learn to live with it.
144
145 I hope this helps :)
146
148 The object oriented interface lets you configure your own encoding or
149 decoding style, within the limits of supported formats.
150
151 $json = new JSON::XS
152 Creates a new JSON::XS object that can be used to de/encode JSON
153 strings. All boolean flags described below are by default disabled
154 (with the exception of "allow_nonref", which defaults to enabled
155 since version 4.0).
156
157 The mutators for flags all return the JSON object again and thus
158 calls can be chained:
159
160 my $json = JSON::XS->new->utf8->space_after->encode ({a => [1,2]})
161 => {"a": [1, 2]}
162
163 $json = $json->ascii ([$enable])
164 $enabled = $json->get_ascii
165 If $enable is true (or missing), then the "encode" method will not
166 generate characters outside the code range 0..127 (which is ASCII).
167 Any Unicode characters outside that range will be escaped using
168 either a single \uXXXX (BMP characters) or a double \uHHHH\uLLLLL
169 escape sequence, as per RFC4627. The resulting encoded JSON text
170 can be treated as a native Unicode string, an ascii-encoded,
171 latin1-encoded or UTF-8 encoded string, or any other superset of
172 ASCII.
173
174 If $enable is false, then the "encode" method will not escape
175 Unicode characters unless required by the JSON syntax or other
176 flags. This results in a faster and more compact format.
177
178 See also the section ENCODING/CODESET FLAG NOTES later in this
179 document.
180
181 The main use for this flag is to produce JSON texts that can be
182 transmitted over a 7-bit channel, as the encoded JSON texts will
183 not contain any 8 bit characters.
184
185 JSON::XS->new->ascii (1)->encode ([chr 0x10401])
186 => ["\ud801\udc01"]
187
188 $json = $json->latin1 ([$enable])
189 $enabled = $json->get_latin1
190 If $enable is true (or missing), then the "encode" method will
191 encode the resulting JSON text as latin1 (or iso-8859-1), escaping
192 any characters outside the code range 0..255. The resulting string
193 can be treated as a latin1-encoded JSON text or a native Unicode
194 string. The "decode" method will not be affected in any way by this
195 flag, as "decode" by default expects Unicode, which is a strict
196 superset of latin1.
197
198 If $enable is false, then the "encode" method will not escape
199 Unicode characters unless required by the JSON syntax or other
200 flags.
201
202 See also the section ENCODING/CODESET FLAG NOTES later in this
203 document.
204
205 The main use for this flag is efficiently encoding binary data as
206 JSON text, as most octets will not be escaped, resulting in a
207 smaller encoded size. The disadvantage is that the resulting JSON
208 text is encoded in latin1 (and must correctly be treated as such
209 when storing and transferring), a rare encoding for JSON. It is
210 therefore most useful when you want to store data structures known
211 to contain binary data efficiently in files or databases, not when
212 talking to other JSON encoders/decoders.
213
214 JSON::XS->new->latin1->encode (["\x{89}\x{abc}"]
215 => ["\x{89}\\u0abc"] # (perl syntax, U+abc escaped, U+89 not)
216
217 $json = $json->utf8 ([$enable])
218 $enabled = $json->get_utf8
219 If $enable is true (or missing), then the "encode" method will
220 encode the JSON result into UTF-8, as required by many protocols,
221 while the "decode" method expects to be handed a UTF-8-encoded
222 string. Please note that UTF-8-encoded strings do not contain any
223 characters outside the range 0..255, they are thus useful for
224 bytewise/binary I/O. In future versions, enabling this option might
225 enable autodetection of the UTF-16 and UTF-32 encoding families, as
226 described in RFC4627.
227
228 If $enable is false, then the "encode" method will return the JSON
229 string as a (non-encoded) Unicode string, while "decode" expects
230 thus a Unicode string. Any decoding or encoding (e.g. to UTF-8 or
231 UTF-16) needs to be done yourself, e.g. using the Encode module.
232
233 See also the section ENCODING/CODESET FLAG NOTES later in this
234 document.
235
236 Example, output UTF-16BE-encoded JSON:
237
238 use Encode;
239 $jsontext = encode "UTF-16BE", JSON::XS->new->encode ($object);
240
241 Example, decode UTF-32LE-encoded JSON:
242
243 use Encode;
244 $object = JSON::XS->new->decode (decode "UTF-32LE", $jsontext);
245
246 $json = $json->pretty ([$enable])
247 This enables (or disables) all of the "indent", "space_before" and
248 "space_after" (and in the future possibly more) flags in one call
249 to generate the most readable (or most compact) form possible.
250
251 Example, pretty-print some simple structure:
252
253 my $json = JSON::XS->new->pretty(1)->encode ({a => [1,2]})
254 =>
255 {
256 "a" : [
257 1,
258 2
259 ]
260 }
261
262 $json = $json->indent ([$enable])
263 $enabled = $json->get_indent
264 If $enable is true (or missing), then the "encode" method will use
265 a multiline format as output, putting every array member or
266 object/hash key-value pair into its own line, indenting them
267 properly.
268
269 If $enable is false, no newlines or indenting will be produced, and
270 the resulting JSON text is guaranteed not to contain any
271 "newlines".
272
273 This setting has no effect when decoding JSON texts.
274
275 $json = $json->space_before ([$enable])
276 $enabled = $json->get_space_before
277 If $enable is true (or missing), then the "encode" method will add
278 an extra optional space before the ":" separating keys from values
279 in JSON objects.
280
281 If $enable is false, then the "encode" method will not add any
282 extra space at those places.
283
284 This setting has no effect when decoding JSON texts. You will also
285 most likely combine this setting with "space_after".
286
287 Example, space_before enabled, space_after and indent disabled:
288
289 {"key" :"value"}
290
291 $json = $json->space_after ([$enable])
292 $enabled = $json->get_space_after
293 If $enable is true (or missing), then the "encode" method will add
294 an extra optional space after the ":" separating keys from values
295 in JSON objects and extra whitespace after the "," separating key-
296 value pairs and array members.
297
298 If $enable is false, then the "encode" method will not add any
299 extra space at those places.
300
301 This setting has no effect when decoding JSON texts.
302
303 Example, space_before and indent disabled, space_after enabled:
304
305 {"key": "value"}
306
307 $json = $json->relaxed ([$enable])
308 $enabled = $json->get_relaxed
309 If $enable is true (or missing), then "decode" will accept some
310 extensions to normal JSON syntax (see below). "encode" will not be
311 affected in any way. Be aware that this option makes you accept
312 invalid JSON texts as if they were valid!. I suggest only to use
313 this option to parse application-specific files written by humans
314 (configuration files, resource files etc.)
315
316 If $enable is false (the default), then "decode" will only accept
317 valid JSON texts.
318
319 Currently accepted extensions are:
320
321 • list items can have an end-comma
322
323 JSON separates array elements and key-value pairs with commas.
324 This can be annoying if you write JSON texts manually and want
325 to be able to quickly append elements, so this extension
326 accepts comma at the end of such items not just between them:
327
328 [
329 1,
330 2, <- this comma not normally allowed
331 ]
332 {
333 "k1": "v1",
334 "k2": "v2", <- this comma not normally allowed
335 }
336
337 • shell-style '#'-comments
338
339 Whenever JSON allows whitespace, shell-style comments are
340 additionally allowed. They are terminated by the first
341 carriage-return or line-feed character, after which more white-
342 space and comments are allowed.
343
344 [
345 1, # this comment not allowed in JSON
346 # neither this one...
347 ]
348
349 • literal ASCII TAB characters in strings
350
351 Literal ASCII TAB characters are now allowed in strings (and
352 treated as "\t").
353
354 [
355 "Hello\tWorld",
356 "Hello<TAB>World", # literal <TAB> would not normally be allowed
357 ]
358
359 $json = $json->canonical ([$enable])
360 $enabled = $json->get_canonical
361 If $enable is true (or missing), then the "encode" method will
362 output JSON objects by sorting their keys. This is adding a
363 comparatively high overhead.
364
365 If $enable is false, then the "encode" method will output key-value
366 pairs in the order Perl stores them (which will likely change
367 between runs of the same script, and can change even within the
368 same run from 5.18 onwards).
369
370 This option is useful if you want the same data structure to be
371 encoded as the same JSON text (given the same overall settings). If
372 it is disabled, the same hash might be encoded differently even if
373 contains the same data, as key-value pairs have no inherent
374 ordering in Perl.
375
376 This setting has no effect when decoding JSON texts.
377
378 This setting has currently no effect on tied hashes.
379
380 $json = $json->allow_nonref ([$enable])
381 $enabled = $json->get_allow_nonref
382 Unlike other boolean options, this opotion is enabled by default
383 beginning with version 4.0. See "SECURITY CONSIDERATIONS" for the
384 gory details.
385
386 If $enable is true (or missing), then the "encode" method can
387 convert a non-reference into its corresponding string, number or
388 null JSON value, which is an extension to RFC4627. Likewise,
389 "decode" will accept those JSON values instead of croaking.
390
391 If $enable is false, then the "encode" method will croak if it
392 isn't passed an arrayref or hashref, as JSON texts must either be
393 an object or array. Likewise, "decode" will croak if given
394 something that is not a JSON object or array.
395
396 Example, encode a Perl scalar as JSON value without enabled
397 "allow_nonref", resulting in an error:
398
399 JSON::XS->new->allow_nonref (0)->encode ("Hello, World!")
400 => hash- or arrayref expected...
401
402 $json = $json->allow_unknown ([$enable])
403 $enabled = $json->get_allow_unknown
404 If $enable is true (or missing), then "encode" will not throw an
405 exception when it encounters values it cannot represent in JSON
406 (for example, filehandles) but instead will encode a JSON "null"
407 value. Note that blessed objects are not included here and are
408 handled separately by c<allow_nonref>.
409
410 If $enable is false (the default), then "encode" will throw an
411 exception when it encounters anything it cannot encode as JSON.
412
413 This option does not affect "decode" in any way, and it is
414 recommended to leave it off unless you know your communications
415 partner.
416
417 $json = $json->allow_blessed ([$enable])
418 $enabled = $json->get_allow_blessed
419 See "OBJECT SERIALISATION" for details.
420
421 If $enable is true (or missing), then the "encode" method will not
422 barf when it encounters a blessed reference that it cannot convert
423 otherwise. Instead, a JSON "null" value is encoded instead of the
424 object.
425
426 If $enable is false (the default), then "encode" will throw an
427 exception when it encounters a blessed object that it cannot
428 convert otherwise.
429
430 This setting has no effect on "decode".
431
432 $json = $json->convert_blessed ([$enable])
433 $enabled = $json->get_convert_blessed
434 See "OBJECT SERIALISATION" for details.
435
436 If $enable is true (or missing), then "encode", upon encountering a
437 blessed object, will check for the availability of the "TO_JSON"
438 method on the object's class. If found, it will be called in scalar
439 context and the resulting scalar will be encoded instead of the
440 object.
441
442 The "TO_JSON" method may safely call die if it wants. If "TO_JSON"
443 returns other blessed objects, those will be handled in the same
444 way. "TO_JSON" must take care of not causing an endless recursion
445 cycle (== crash) in this case. The name of "TO_JSON" was chosen
446 because other methods called by the Perl core (== not by the user
447 of the object) are usually in upper case letters and to avoid
448 collisions with any "to_json" function or method.
449
450 If $enable is false (the default), then "encode" will not consider
451 this type of conversion.
452
453 This setting has no effect on "decode".
454
455 $json = $json->allow_tags ([$enable])
456 $enabled = $json->get_allow_tags
457 See "OBJECT SERIALISATION" for details.
458
459 If $enable is true (or missing), then "encode", upon encountering a
460 blessed object, will check for the availability of the "FREEZE"
461 method on the object's class. If found, it will be used to
462 serialise the object into a nonstandard tagged JSON value (that
463 JSON decoders cannot decode).
464
465 It also causes "decode" to parse such tagged JSON values and
466 deserialise them via a call to the "THAW" method.
467
468 If $enable is false (the default), then "encode" will not consider
469 this type of conversion, and tagged JSON values will cause a parse
470 error in "decode", as if tags were not part of the grammar.
471
472 $json->boolean_values ([$false, $true])
473 ($false, $true) = $json->get_boolean_values
474 By default, JSON booleans will be decoded as overloaded
475 $Types::Serialiser::false and $Types::Serialiser::true objects.
476
477 With this method you can specify your own boolean values for
478 decoding - on decode, JSON "false" will be decoded as a copy of
479 $false, and JSON "true" will be decoded as $true ("copy" here is
480 the same thing as assigning a value to another variable, i.e.
481 "$copy = $false").
482
483 Calling this method without any arguments will reset the booleans
484 to their default values.
485
486 "get_boolean_values" will return both $false and $true values, or
487 the empty list when they are set to the default.
488
489 $json = $json->filter_json_object ([$coderef->($hashref)])
490 When $coderef is specified, it will be called from "decode" each
491 time it decodes a JSON object. The only argument is a reference to
492 the newly-created hash. If the code reference returns a single
493 scalar (which need not be a reference), this value (or rather a
494 copy of it) is inserted into the deserialised data structure. If it
495 returns an empty list (NOTE: not "undef", which is a valid scalar),
496 the original deserialised hash will be inserted. This setting can
497 slow down decoding considerably.
498
499 When $coderef is omitted or undefined, any existing callback will
500 be removed and "decode" will not change the deserialised hash in
501 any way.
502
503 Example, convert all JSON objects into the integer 5:
504
505 my $js = JSON::XS->new->filter_json_object (sub { 5 });
506 # returns [5]
507 $js->decode ('[{}]')
508 # throw an exception because allow_nonref is not enabled
509 # so a lone 5 is not allowed.
510 $js->decode ('{"a":1, "b":2}');
511
512 $json = $json->filter_json_single_key_object ($key [=>
513 $coderef->($value)])
514 Works remotely similar to "filter_json_object", but is only called
515 for JSON objects having a single key named $key.
516
517 This $coderef is called before the one specified via
518 "filter_json_object", if any. It gets passed the single value in
519 the JSON object. If it returns a single value, it will be inserted
520 into the data structure. If it returns nothing (not even "undef"
521 but the empty list), the callback from "filter_json_object" will be
522 called next, as if no single-key callback were specified.
523
524 If $coderef is omitted or undefined, the corresponding callback
525 will be disabled. There can only ever be one callback for a given
526 key.
527
528 As this callback gets called less often then the
529 "filter_json_object" one, decoding speed will not usually suffer as
530 much. Therefore, single-key objects make excellent targets to
531 serialise Perl objects into, especially as single-key JSON objects
532 are as close to the type-tagged value concept as JSON gets (it's
533 basically an ID/VALUE tuple). Of course, JSON does not support this
534 in any way, so you need to make sure your data never looks like a
535 serialised Perl hash.
536
537 Typical names for the single object key are "__class_whatever__",
538 or "$__dollars_are_rarely_used__$" or "}ugly_brace_placement", or
539 even things like "__class_md5sum(classname)__", to reduce the risk
540 of clashing with real hashes.
541
542 Example, decode JSON objects of the form "{ "__widget__" => <id> }"
543 into the corresponding $WIDGET{<id>} object:
544
545 # return whatever is in $WIDGET{5}:
546 JSON::XS
547 ->new
548 ->filter_json_single_key_object (__widget__ => sub {
549 $WIDGET{ $_[0] }
550 })
551 ->decode ('{"__widget__": 5')
552
553 # this can be used with a TO_JSON method in some "widget" class
554 # for serialisation to json:
555 sub WidgetBase::TO_JSON {
556 my ($self) = @_;
557
558 unless ($self->{id}) {
559 $self->{id} = ..get..some..id..;
560 $WIDGET{$self->{id}} = $self;
561 }
562
563 { __widget__ => $self->{id} }
564 }
565
566 $json = $json->shrink ([$enable])
567 $enabled = $json->get_shrink
568 Perl usually over-allocates memory a bit when allocating space for
569 strings. This flag optionally resizes strings generated by either
570 "encode" or "decode" to their minimum size possible. This can save
571 memory when your JSON texts are either very very long or you have
572 many short strings. It will also try to downgrade any strings to
573 octet-form if possible: perl stores strings internally either in an
574 encoding called UTF-X or in octet-form. The latter cannot store
575 everything but uses less space in general (and some buggy Perl or C
576 code might even rely on that internal representation being used).
577
578 The actual definition of what shrink does might change in future
579 versions, but it will always try to save space at the expense of
580 time.
581
582 If $enable is true (or missing), the string returned by "encode"
583 will be shrunk-to-fit, while all strings generated by "decode" will
584 also be shrunk-to-fit.
585
586 If $enable is false, then the normal perl allocation algorithms are
587 used. If you work with your data, then this is likely to be
588 faster.
589
590 In the future, this setting might control other things, such as
591 converting strings that look like integers or floats into integers
592 or floats internally (there is no difference on the Perl level),
593 saving space.
594
595 $json = $json->max_depth ([$maximum_nesting_depth])
596 $max_depth = $json->get_max_depth
597 Sets the maximum nesting level (default 512) accepted while
598 encoding or decoding. If a higher nesting level is detected in JSON
599 text or a Perl data structure, then the encoder and decoder will
600 stop and croak at that point.
601
602 Nesting level is defined by number of hash- or arrayrefs that the
603 encoder needs to traverse to reach a given point or the number of
604 "{" or "[" characters without their matching closing parenthesis
605 crossed to reach a given character in a string.
606
607 Setting the maximum depth to one disallows any nesting, so that
608 ensures that the object is only a single hash/object or array.
609
610 If no argument is given, the highest possible setting will be used,
611 which is rarely useful.
612
613 Note that nesting is implemented by recursion in C. The default
614 value has been chosen to be as large as typical operating systems
615 allow without crashing.
616
617 See SECURITY CONSIDERATIONS, below, for more info on why this is
618 useful.
619
620 $json = $json->max_size ([$maximum_string_size])
621 $max_size = $json->get_max_size
622 Set the maximum length a JSON text may have (in bytes) where
623 decoding is being attempted. The default is 0, meaning no limit.
624 When "decode" is called on a string that is longer then this many
625 bytes, it will not attempt to decode the string but throw an
626 exception. This setting has no effect on "encode" (yet).
627
628 If no argument is given, the limit check will be deactivated (same
629 as when 0 is specified).
630
631 See SECURITY CONSIDERATIONS, below, for more info on why this is
632 useful.
633
634 $json_text = $json->encode ($perl_scalar)
635 Converts the given Perl value or data structure to its JSON
636 representation. Croaks on error.
637
638 $perl_scalar = $json->decode ($json_text)
639 The opposite of "encode": expects a JSON text and tries to parse
640 it, returning the resulting simple scalar or reference. Croaks on
641 error.
642
643 ($perl_scalar, $characters) = $json->decode_prefix ($json_text)
644 This works like the "decode" method, but instead of raising an
645 exception when there is trailing garbage after the first JSON
646 object, it will silently stop parsing there and return the number
647 of characters consumed so far.
648
649 This is useful if your JSON texts are not delimited by an outer
650 protocol and you need to know where the JSON text ends.
651
652 JSON::XS->new->decode_prefix ("[1] the tail")
653 => ([1], 3)
654
656 In some cases, there is the need for incremental parsing of JSON texts.
657 While this module always has to keep both JSON text and resulting Perl
658 data structure in memory at one time, it does allow you to parse a JSON
659 stream incrementally. It does so by accumulating text until it has a
660 full JSON object, which it then can decode. This process is similar to
661 using "decode_prefix" to see if a full JSON object is available, but is
662 much more efficient (and can be implemented with a minimum of method
663 calls).
664
665 JSON::XS will only attempt to parse the JSON text once it is sure it
666 has enough text to get a decisive result, using a very simple but truly
667 incremental parser. This means that it sometimes won't stop as early as
668 the full parser, for example, it doesn't detect mismatched parentheses.
669 The only thing it guarantees is that it starts decoding as soon as a
670 syntactically valid JSON text has been seen. This means you need to set
671 resource limits (e.g. "max_size") to ensure the parser will stop
672 parsing in the presence if syntax errors.
673
674 The following methods implement this incremental parser.
675
676 [void, scalar or list context] = $json->incr_parse ([$string])
677 This is the central parsing function. It can both append new text
678 and extract objects from the stream accumulated so far (both of
679 these functions are optional).
680
681 If $string is given, then this string is appended to the already
682 existing JSON fragment stored in the $json object.
683
684 After that, if the function is called in void context, it will
685 simply return without doing anything further. This can be used to
686 add more text in as many chunks as you want.
687
688 If the method is called in scalar context, then it will try to
689 extract exactly one JSON object. If that is successful, it will
690 return this object, otherwise it will return "undef". If there is a
691 parse error, this method will croak just as "decode" would do (one
692 can then use "incr_skip" to skip the erroneous part). This is the
693 most common way of using the method.
694
695 And finally, in list context, it will try to extract as many
696 objects from the stream as it can find and return them, or the
697 empty list otherwise. For this to work, there must be no separators
698 (other than whitespace) between the JSON objects or arrays, instead
699 they must be concatenated back-to-back. If an error occurs, an
700 exception will be raised as in the scalar context case. Note that
701 in this case, any previously-parsed JSON texts will be lost.
702
703 Example: Parse some JSON arrays/objects in a given string and
704 return them.
705
706 my @objs = JSON::XS->new->incr_parse ("[5][7][1,2]");
707
708 $lvalue_string = $json->incr_text
709 This method returns the currently stored JSON fragment as an
710 lvalue, that is, you can manipulate it. This only works when a
711 preceding call to "incr_parse" in scalar context successfully
712 returned an object. Under all other circumstances you must not call
713 this function (I mean it. although in simple tests it might
714 actually work, it will fail under real world conditions). As a
715 special exception, you can also call this method before having
716 parsed anything.
717
718 That means you can only use this function to look at or manipulate
719 text before or after complete JSON objects, not while the parser is
720 in the middle of parsing a JSON object.
721
722 This function is useful in two cases: a) finding the trailing text
723 after a JSON object or b) parsing multiple JSON objects separated
724 by non-JSON text (such as commas).
725
726 $json->incr_skip
727 This will reset the state of the incremental parser and will remove
728 the parsed text from the input buffer so far. This is useful after
729 "incr_parse" died, in which case the input buffer and incremental
730 parser state is left unchanged, to skip the text parsed so far and
731 to reset the parse state.
732
733 The difference to "incr_reset" is that only text until the parse
734 error occurred is removed.
735
736 $json->incr_reset
737 This completely resets the incremental parser, that is, after this
738 call, it will be as if the parser had never parsed anything.
739
740 This is useful if you want to repeatedly parse JSON objects and
741 want to ignore any trailing data, which means you have to reset the
742 parser after each successful decode.
743
744 LIMITATIONS
745 The incremental parser is a non-exact parser: it works by gathering as
746 much text as possible that could be a valid JSON text, followed by
747 trying to decode it.
748
749 That means it sometimes needs to read more data than strictly necessary
750 to diagnose an invalid JSON text. For example, after parsing the
751 following fragment, the parser could stop with an error, as this
752 fragment cannot be the beginning of a valid JSON text:
753
754 [,
755
756 In reality, hopwever, the parser might continue to read data until a
757 length limit is exceeded or it finds a closing bracket.
758
759 EXAMPLES
760 Some examples will make all this clearer. First, a simple example that
761 works similarly to "decode_prefix": We want to decode the JSON object
762 at the start of a string and identify the portion after the JSON
763 object:
764
765 my $text = "[1,2,3] hello";
766
767 my $json = new JSON::XS;
768
769 my $obj = $json->incr_parse ($text)
770 or die "expected JSON object or array at beginning of string";
771
772 my $tail = $json->incr_text;
773 # $tail now contains " hello"
774
775 Easy, isn't it?
776
777 Now for a more complicated example: Imagine a hypothetical protocol
778 where you read some requests from a TCP stream, and each request is a
779 JSON array, without any separation between them (in fact, it is often
780 useful to use newlines as "separators", as these get interpreted as
781 whitespace at the start of the JSON text, which makes it possible to
782 test said protocol with "telnet"...).
783
784 Here is how you'd do it (it is trivial to write this in an event-based
785 manner):
786
787 my $json = new JSON::XS;
788
789 # read some data from the socket
790 while (sysread $socket, my $buf, 4096) {
791
792 # split and decode as many requests as possible
793 for my $request ($json->incr_parse ($buf)) {
794 # act on the $request
795 }
796 }
797
798 Another complicated example: Assume you have a string with JSON objects
799 or arrays, all separated by (optional) comma characters (e.g. "[1],[2],
800 [3]"). To parse them, we have to skip the commas between the JSON
801 texts, and here is where the lvalue-ness of "incr_text" comes in
802 useful:
803
804 my $text = "[1],[2], [3]";
805 my $json = new JSON::XS;
806
807 # void context, so no parsing done
808 $json->incr_parse ($text);
809
810 # now extract as many objects as possible. note the
811 # use of scalar context so incr_text can be called.
812 while (my $obj = $json->incr_parse) {
813 # do something with $obj
814
815 # now skip the optional comma
816 $json->incr_text =~ s/^ \s* , //x;
817 }
818
819 Now lets go for a very complex example: Assume that you have a gigantic
820 JSON array-of-objects, many gigabytes in size, and you want to parse
821 it, but you cannot load it into memory fully (this has actually
822 happened in the real world :).
823
824 Well, you lost, you have to implement your own JSON parser. But
825 JSON::XS can still help you: You implement a (very simple) array parser
826 and let JSON decode the array elements, which are all full JSON objects
827 on their own (this wouldn't work if the array elements could be JSON
828 numbers, for example):
829
830 my $json = new JSON::XS;
831
832 # open the monster
833 open my $fh, "<bigfile.json"
834 or die "bigfile: $!";
835
836 # first parse the initial "["
837 for (;;) {
838 sysread $fh, my $buf, 65536
839 or die "read error: $!";
840 $json->incr_parse ($buf); # void context, so no parsing
841
842 # Exit the loop once we found and removed(!) the initial "[".
843 # In essence, we are (ab-)using the $json object as a simple scalar
844 # we append data to.
845 last if $json->incr_text =~ s/^ \s* \[ //x;
846 }
847
848 # now we have the skipped the initial "[", so continue
849 # parsing all the elements.
850 for (;;) {
851 # in this loop we read data until we got a single JSON object
852 for (;;) {
853 if (my $obj = $json->incr_parse) {
854 # do something with $obj
855 last;
856 }
857
858 # add more data
859 sysread $fh, my $buf, 65536
860 or die "read error: $!";
861 $json->incr_parse ($buf); # void context, so no parsing
862 }
863
864 # in this loop we read data until we either found and parsed the
865 # separating "," between elements, or the final "]"
866 for (;;) {
867 # first skip whitespace
868 $json->incr_text =~ s/^\s*//;
869
870 # if we find "]", we are done
871 if ($json->incr_text =~ s/^\]//) {
872 print "finished.\n";
873 exit;
874 }
875
876 # if we find ",", we can continue with the next element
877 if ($json->incr_text =~ s/^,//) {
878 last;
879 }
880
881 # if we find anything else, we have a parse error!
882 if (length $json->incr_text) {
883 die "parse error near ", $json->incr_text;
884 }
885
886 # else add more data
887 sysread $fh, my $buf, 65536
888 or die "read error: $!";
889 $json->incr_parse ($buf); # void context, so no parsing
890 }
891
892 This is a complex example, but most of the complexity comes from the
893 fact that we are trying to be correct (bear with me if I am wrong, I
894 never ran the above example :).
895
897 This section describes how JSON::XS maps Perl values to JSON values and
898 vice versa. These mappings are designed to "do the right thing" in most
899 circumstances automatically, preserving round-tripping characteristics
900 (what you put in comes out as something equivalent).
901
902 For the more enlightened: note that in the following descriptions,
903 lowercase perl refers to the Perl interpreter, while uppercase Perl
904 refers to the abstract Perl language itself.
905
906 JSON -> PERL
907 object
908 A JSON object becomes a reference to a hash in Perl. No ordering of
909 object keys is preserved (JSON does not preserve object key
910 ordering itself).
911
912 array
913 A JSON array becomes a reference to an array in Perl.
914
915 string
916 A JSON string becomes a string scalar in Perl - Unicode codepoints
917 in JSON are represented by the same codepoints in the Perl string,
918 so no manual decoding is necessary.
919
920 number
921 A JSON number becomes either an integer, numeric (floating point)
922 or string scalar in perl, depending on its range and any fractional
923 parts. On the Perl level, there is no difference between those as
924 Perl handles all the conversion details, but an integer may take
925 slightly less memory and might represent more values exactly than
926 floating point numbers.
927
928 If the number consists of digits only, JSON::XS will try to
929 represent it as an integer value. If that fails, it will try to
930 represent it as a numeric (floating point) value if that is
931 possible without loss of precision. Otherwise it will preserve the
932 number as a string value (in which case you lose roundtripping
933 ability, as the JSON number will be re-encoded to a JSON string).
934
935 Numbers containing a fractional or exponential part will always be
936 represented as numeric (floating point) values, possibly at a loss
937 of precision (in which case you might lose perfect roundtripping
938 ability, but the JSON number will still be re-encoded as a JSON
939 number).
940
941 Note that precision is not accuracy - binary floating point values
942 cannot represent most decimal fractions exactly, and when
943 converting from and to floating point, JSON::XS only guarantees
944 precision up to but not including the least significant bit.
945
946 true, false
947 These JSON atoms become "Types::Serialiser::true" and
948 "Types::Serialiser::false", respectively. They are overloaded to
949 act almost exactly like the numbers 1 and 0. You can check whether
950 a scalar is a JSON boolean by using the
951 "Types::Serialiser::is_bool" function (after "use
952 Types::Serialier", of course).
953
954 null
955 A JSON null atom becomes "undef" in Perl.
956
957 shell-style comments ("# text")
958 As a nonstandard extension to the JSON syntax that is enabled by
959 the "relaxed" setting, shell-style comments are allowed. They can
960 start anywhere outside strings and go till the end of the line.
961
962 tagged values ("(tag)value").
963 Another nonstandard extension to the JSON syntax, enabled with the
964 "allow_tags" setting, are tagged values. In this implementation,
965 the tag must be a perl package/class name encoded as a JSON string,
966 and the value must be a JSON array encoding optional constructor
967 arguments.
968
969 See "OBJECT SERIALISATION", below, for details.
970
971 PERL -> JSON
972 The mapping from Perl to JSON is slightly more difficult, as Perl is a
973 truly typeless language, so we can only guess which JSON type is meant
974 by a Perl value.
975
976 hash references
977 Perl hash references become JSON objects. As there is no inherent
978 ordering in hash keys (or JSON objects), they will usually be
979 encoded in a pseudo-random order. JSON::XS can optionally sort the
980 hash keys (determined by the canonical flag), so the same
981 datastructure will serialise to the same JSON text (given same
982 settings and version of JSON::XS), but this incurs a runtime
983 overhead and is only rarely useful, e.g. when you want to compare
984 some JSON text against another for equality.
985
986 array references
987 Perl array references become JSON arrays.
988
989 other references
990 Other unblessed references are generally not allowed and will cause
991 an exception to be thrown, except for references to the integers 0
992 and 1, which get turned into "false" and "true" atoms in JSON.
993
994 Since "JSON::XS" uses the boolean model from Types::Serialiser, you
995 can also "use Types::Serialiser" and then use
996 "Types::Serialiser::false" and "Types::Serialiser::true" to improve
997 readability.
998
999 use Types::Serialiser;
1000 encode_json [\0, Types::Serialiser::true] # yields [false,true]
1001
1002 Types::Serialiser::true, Types::Serialiser::false
1003 These special values from the Types::Serialiser module become JSON
1004 true and JSON false values, respectively. You can also use "\1" and
1005 "\0" directly if you want.
1006
1007 blessed objects
1008 Blessed objects are not directly representable in JSON, but
1009 "JSON::XS" allows various ways of handling objects. See "OBJECT
1010 SERIALISATION", below, for details.
1011
1012 simple scalars
1013 Simple Perl scalars (any scalar that is not a reference) are the
1014 most difficult objects to encode: JSON::XS will encode undefined
1015 scalars as JSON "null" values, scalars that have last been used in
1016 a string context before encoding as JSON strings, and anything else
1017 as number value:
1018
1019 # dump as number
1020 encode_json [2] # yields [2]
1021 encode_json [-3.0e17] # yields [-3e+17]
1022 my $value = 5; encode_json [$value] # yields [5]
1023
1024 # used as string, so dump as string
1025 print $value;
1026 encode_json [$value] # yields ["5"]
1027
1028 # undef becomes null
1029 encode_json [undef] # yields [null]
1030
1031 You can force the type to be a JSON string by stringifying it:
1032
1033 my $x = 3.1; # some variable containing a number
1034 "$x"; # stringified
1035 $x .= ""; # another, more awkward way to stringify
1036 print $x; # perl does it for you, too, quite often
1037
1038 You can force the type to be a JSON number by numifying it:
1039
1040 my $x = "3"; # some variable containing a string
1041 $x += 0; # numify it, ensuring it will be dumped as a number
1042 $x *= 1; # same thing, the choice is yours.
1043
1044 You can not currently force the type in other, less obscure, ways.
1045 Tell me if you need this capability (but don't forget to explain
1046 why it's needed :).
1047
1048 Note that numerical precision has the same meaning as under Perl
1049 (so binary to decimal conversion follows the same rules as in Perl,
1050 which can differ to other languages). Also, your perl interpreter
1051 might expose extensions to the floating point numbers of your
1052 platform, such as infinities or NaN's - these cannot be represented
1053 in JSON, and it is an error to pass those in.
1054
1055 OBJECT SERIALISATION
1056 As JSON cannot directly represent Perl objects, you have to choose
1057 between a pure JSON representation (without the ability to deserialise
1058 the object automatically again), and a nonstandard extension to the
1059 JSON syntax, tagged values.
1060
1061 SERIALISATION
1062
1063 What happens when "JSON::XS" encounters a Perl object depends on the
1064 "allow_blessed", "convert_blessed" and "allow_tags" settings, which are
1065 used in this order:
1066
1067 1. "allow_tags" is enabled and the object has a "FREEZE" method.
1068 In this case, "JSON::XS" uses the Types::Serialiser object
1069 serialisation protocol to create a tagged JSON value, using a
1070 nonstandard extension to the JSON syntax.
1071
1072 This works by invoking the "FREEZE" method on the object, with the
1073 first argument being the object to serialise, and the second
1074 argument being the constant string "JSON" to distinguish it from
1075 other serialisers.
1076
1077 The "FREEZE" method can return any number of values (i.e. zero or
1078 more). These values and the paclkage/classname of the object will
1079 then be encoded as a tagged JSON value in the following format:
1080
1081 ("classname")[FREEZE return values...]
1082
1083 e.g.:
1084
1085 ("URI")["http://www.google.com/"]
1086 ("MyDate")[2013,10,29]
1087 ("ImageData::JPEG")["Z3...VlCg=="]
1088
1089 For example, the hypothetical "My::Object" "FREEZE" method might
1090 use the objects "type" and "id" members to encode the object:
1091
1092 sub My::Object::FREEZE {
1093 my ($self, $serialiser) = @_;
1094
1095 ($self->{type}, $self->{id})
1096 }
1097
1098 2. "convert_blessed" is enabled and the object has a "TO_JSON" method.
1099 In this case, the "TO_JSON" method of the object is invoked in
1100 scalar context. It must return a single scalar that can be directly
1101 encoded into JSON. This scalar replaces the object in the JSON
1102 text.
1103
1104 For example, the following "TO_JSON" method will convert all URI
1105 objects to JSON strings when serialised. The fatc that these values
1106 originally were URI objects is lost.
1107
1108 sub URI::TO_JSON {
1109 my ($uri) = @_;
1110 $uri->as_string
1111 }
1112
1113 3. "allow_blessed" is enabled.
1114 The object will be serialised as a JSON null value.
1115
1116 4. none of the above
1117 If none of the settings are enabled or the respective methods are
1118 missing, "JSON::XS" throws an exception.
1119
1120 DESERIALISATION
1121
1122 For deserialisation there are only two cases to consider: either
1123 nonstandard tagging was used, in which case "allow_tags" decides, or
1124 objects cannot be automatically be deserialised, in which case you can
1125 use postprocessing or the "filter_json_object" or
1126 "filter_json_single_key_object" callbacks to get some real objects our
1127 of your JSON.
1128
1129 This section only considers the tagged value case: I a tagged JSON
1130 object is encountered during decoding and "allow_tags" is disabled, a
1131 parse error will result (as if tagged values were not part of the
1132 grammar).
1133
1134 If "allow_tags" is enabled, "JSON::XS" will look up the "THAW" method
1135 of the package/classname used during serialisation (it will not attempt
1136 to load the package as a Perl module). If there is no such method, the
1137 decoding will fail with an error.
1138
1139 Otherwise, the "THAW" method is invoked with the classname as first
1140 argument, the constant string "JSON" as second argument, and all the
1141 values from the JSON array (the values originally returned by the
1142 "FREEZE" method) as remaining arguments.
1143
1144 The method must then return the object. While technically you can
1145 return any Perl scalar, you might have to enable the "enable_nonref"
1146 setting to make that work in all cases, so better return an actual
1147 blessed reference.
1148
1149 As an example, let's implement a "THAW" function that regenerates the
1150 "My::Object" from the "FREEZE" example earlier:
1151
1152 sub My::Object::THAW {
1153 my ($class, $serialiser, $type, $id) = @_;
1154
1155 $class->new (type => $type, id => $id)
1156 }
1157
1159 The interested reader might have seen a number of flags that signify
1160 encodings or codesets - "utf8", "latin1" and "ascii". There seems to be
1161 some confusion on what these do, so here is a short comparison:
1162
1163 "utf8" controls whether the JSON text created by "encode" (and expected
1164 by "decode") is UTF-8 encoded or not, while "latin1" and "ascii" only
1165 control whether "encode" escapes character values outside their
1166 respective codeset range. Neither of these flags conflict with each
1167 other, although some combinations make less sense than others.
1168
1169 Care has been taken to make all flags symmetrical with respect to
1170 "encode" and "decode", that is, texts encoded with any combination of
1171 these flag values will be correctly decoded when the same flags are
1172 used - in general, if you use different flag settings while encoding
1173 vs. when decoding you likely have a bug somewhere.
1174
1175 Below comes a verbose discussion of these flags. Note that a "codeset"
1176 is simply an abstract set of character-codepoint pairs, while an
1177 encoding takes those codepoint numbers and encodes them, in our case
1178 into octets. Unicode is (among other things) a codeset, UTF-8 is an
1179 encoding, and ISO-8859-1 (= latin 1) and ASCII are both codesets and
1180 encodings at the same time, which can be confusing.
1181
1182 "utf8" flag disabled
1183 When "utf8" is disabled (the default), then "encode"/"decode"
1184 generate and expect Unicode strings, that is, characters with high
1185 ordinal Unicode values (> 255) will be encoded as such characters,
1186 and likewise such characters are decoded as-is, no changes to them
1187 will be done, except "(re-)interpreting" them as Unicode codepoints
1188 or Unicode characters, respectively (to Perl, these are the same
1189 thing in strings unless you do funny/weird/dumb stuff).
1190
1191 This is useful when you want to do the encoding yourself (e.g. when
1192 you want to have UTF-16 encoded JSON texts) or when some other
1193 layer does the encoding for you (for example, when printing to a
1194 terminal using a filehandle that transparently encodes to UTF-8 you
1195 certainly do NOT want to UTF-8 encode your data first and have Perl
1196 encode it another time).
1197
1198 "utf8" flag enabled
1199 If the "utf8"-flag is enabled, "encode"/"decode" will encode all
1200 characters using the corresponding UTF-8 multi-byte sequence, and
1201 will expect your input strings to be encoded as UTF-8, that is, no
1202 "character" of the input string must have any value > 255, as UTF-8
1203 does not allow that.
1204
1205 The "utf8" flag therefore switches between two modes: disabled
1206 means you will get a Unicode string in Perl, enabled means you get
1207 a UTF-8 encoded octet/binary string in Perl.
1208
1209 "latin1" or "ascii" flags enabled
1210 With "latin1" (or "ascii") enabled, "encode" will escape characters
1211 with ordinal values > 255 (> 127 with "ascii") and encode the
1212 remaining characters as specified by the "utf8" flag.
1213
1214 If "utf8" is disabled, then the result is also correctly encoded in
1215 those character sets (as both are proper subsets of Unicode,
1216 meaning that a Unicode string with all character values < 256 is
1217 the same thing as a ISO-8859-1 string, and a Unicode string with
1218 all character values < 128 is the same thing as an ASCII string in
1219 Perl).
1220
1221 If "utf8" is enabled, you still get a correct UTF-8-encoded string,
1222 regardless of these flags, just some more characters will be
1223 escaped using "\uXXXX" then before.
1224
1225 Note that ISO-8859-1-encoded strings are not compatible with UTF-8
1226 encoding, while ASCII-encoded strings are. That is because the
1227 ISO-8859-1 encoding is NOT a subset of UTF-8 (despite the
1228 ISO-8859-1 codeset being a subset of Unicode), while ASCII is.
1229
1230 Surprisingly, "decode" will ignore these flags and so treat all
1231 input values as governed by the "utf8" flag. If it is disabled,
1232 this allows you to decode ISO-8859-1- and ASCII-encoded strings, as
1233 both strict subsets of Unicode. If it is enabled, you can correctly
1234 decode UTF-8 encoded strings.
1235
1236 So neither "latin1" nor "ascii" are incompatible with the "utf8"
1237 flag - they only govern when the JSON output engine escapes a
1238 character or not.
1239
1240 The main use for "latin1" is to relatively efficiently store binary
1241 data as JSON, at the expense of breaking compatibility with most
1242 JSON decoders.
1243
1244 The main use for "ascii" is to force the output to not contain
1245 characters with values > 127, which means you can interpret the
1246 resulting string as UTF-8, ISO-8859-1, ASCII, KOI8-R or most about
1247 any character set and 8-bit-encoding, and still get the same data
1248 structure back. This is useful when your channel for JSON transfer
1249 is not 8-bit clean or the encoding might be mangled in between
1250 (e.g. in mail), and works because ASCII is a proper subset of most
1251 8-bit and multibyte encodings in use in the world.
1252
1253 JSON and ECMAscript
1254 JSON syntax is based on how literals are represented in javascript (the
1255 not-standardised predecessor of ECMAscript) which is presumably why it
1256 is called "JavaScript Object Notation".
1257
1258 However, JSON is not a subset (and also not a superset of course) of
1259 ECMAscript (the standard) or javascript (whatever browsers actually
1260 implement).
1261
1262 If you want to use javascript's "eval" function to "parse" JSON, you
1263 might run into parse errors for valid JSON texts, or the resulting data
1264 structure might not be queryable:
1265
1266 One of the problems is that U+2028 and U+2029 are valid characters
1267 inside JSON strings, but are not allowed in ECMAscript string literals,
1268 so the following Perl fragment will not output something that can be
1269 guaranteed to be parsable by javascript's "eval":
1270
1271 use JSON::XS;
1272
1273 print encode_json [chr 0x2028];
1274
1275 The right fix for this is to use a proper JSON parser in your
1276 javascript programs, and not rely on "eval" (see for example Douglas
1277 Crockford's json2.js parser).
1278
1279 If this is not an option, you can, as a stop-gap measure, simply encode
1280 to ASCII-only JSON:
1281
1282 use JSON::XS;
1283
1284 print JSON::XS->new->ascii->encode ([chr 0x2028]);
1285
1286 Note that this will enlarge the resulting JSON text quite a bit if you
1287 have many non-ASCII characters. You might be tempted to run some
1288 regexes to only escape U+2028 and U+2029, e.g.:
1289
1290 # DO NOT USE THIS!
1291 my $json = JSON::XS->new->utf8->encode ([chr 0x2028]);
1292 $json =~ s/\xe2\x80\xa8/\\u2028/g; # escape U+2028
1293 $json =~ s/\xe2\x80\xa9/\\u2029/g; # escape U+2029
1294 print $json;
1295
1296 Note that this is a bad idea: the above only works for U+2028 and
1297 U+2029 and thus only for fully ECMAscript-compliant parsers. Many
1298 existing javascript implementations, however, have issues with other
1299 characters as well - using "eval" naively simply will cause problems.
1300
1301 Another problem is that some javascript implementations reserve some
1302 property names for their own purposes (which probably makes them non-
1303 ECMAscript-compliant). For example, Iceweasel reserves the "__proto__"
1304 property name for its own purposes.
1305
1306 If that is a problem, you could parse try to filter the resulting JSON
1307 output for these property strings, e.g.:
1308
1309 $json =~ s/"__proto__"\s*:/"__proto__renamed":/g;
1310
1311 This works because "__proto__" is not valid outside of strings, so
1312 every occurrence of ""__proto__"\s*:" must be a string used as property
1313 name.
1314
1315 If you know of other incompatibilities, please let me know.
1316
1317 JSON and YAML
1318 You often hear that JSON is a subset of YAML. This is, however, a mass
1319 hysteria(*) and very far from the truth (as of the time of this
1320 writing), so let me state it clearly: in general, there is no way to
1321 configure JSON::XS to output a data structure as valid YAML that works
1322 in all cases.
1323
1324 If you really must use JSON::XS to generate YAML, you should use this
1325 algorithm (subject to change in future versions):
1326
1327 my $to_yaml = JSON::XS->new->utf8->space_after (1);
1328 my $yaml = $to_yaml->encode ($ref) . "\n";
1329
1330 This will usually generate JSON texts that also parse as valid YAML.
1331 Please note that YAML has hardcoded limits on (simple) object key
1332 lengths that JSON doesn't have and also has different and incompatible
1333 unicode character escape syntax, so you should make sure that your hash
1334 keys are noticeably shorter than the 1024 "stream characters" YAML
1335 allows and that you do not have characters with codepoint values
1336 outside the Unicode BMP (basic multilingual page). YAML also does not
1337 allow "\/" sequences in strings (which JSON::XS does not currently
1338 generate, but other JSON generators might).
1339
1340 There might be other incompatibilities that I am not aware of (or the
1341 YAML specification has been changed yet again - it does so quite
1342 often). In general you should not try to generate YAML with a JSON
1343 generator or vice versa, or try to parse JSON with a YAML parser or
1344 vice versa: chances are high that you will run into severe
1345 interoperability problems when you least expect it.
1346
1347 (*) I have been pressured multiple times by Brian Ingerson (one of the
1348 authors of the YAML specification) to remove this paragraph,
1349 despite him acknowledging that the actual incompatibilities exist.
1350 As I was personally bitten by this "JSON is YAML" lie, I refused
1351 and said I will continue to educate people about these issues, so
1352 others do not run into the same problem again and again. After
1353 this, Brian called me a (quote)complete and worthless
1354 idiot(unquote).
1355
1356 In my opinion, instead of pressuring and insulting people who
1357 actually clarify issues with YAML and the wrong statements of some
1358 of its proponents, I would kindly suggest reading the JSON spec
1359 (which is not that difficult or long) and finally make YAML
1360 compatible to it, and educating users about the changes, instead of
1361 spreading lies about the real compatibility for many years and
1362 trying to silence people who point out that it isn't true.
1363
1364 Addendum/2009: the YAML 1.2 spec is still incompatible with JSON,
1365 even though the incompatibilities have been documented (and are
1366 known to Brian) for many years and the spec makes explicit claims
1367 that YAML is a superset of JSON. It would be so easy to fix, but
1368 apparently, bullying people and corrupting userdata is so much
1369 easier.
1370
1371 SPEED
1372 It seems that JSON::XS is surprisingly fast, as shown in the following
1373 tables. They have been generated with the help of the "eg/bench"
1374 program in the JSON::XS distribution, to make it easy to compare on
1375 your own system.
1376
1377 First comes a comparison between various modules using a very short
1378 single-line JSON string (also available at
1379 <http://dist.schmorp.de/misc/json/short.json>).
1380
1381 {"method": "handleMessage", "params": ["user1",
1382 "we were just talking"], "id": null, "array":[1,11,234,-5,1e5,1e7,
1383 1, 0]}
1384
1385 It shows the number of encodes/decodes per second (JSON::XS uses the
1386 functional interface, while JSON::XS/2 uses the OO interface with
1387 pretty-printing and hashkey sorting enabled, JSON::XS/3 enables shrink.
1388 JSON::DWIW/DS uses the deserialise function, while JSON::DWIW::FJ uses
1389 the from_json method). Higher is better:
1390
1391 module | encode | decode |
1392 --------------|------------|------------|
1393 JSON::DWIW/DS | 86302.551 | 102300.098 |
1394 JSON::DWIW/FJ | 86302.551 | 75983.768 |
1395 JSON::PP | 15827.562 | 6638.658 |
1396 JSON::Syck | 63358.066 | 47662.545 |
1397 JSON::XS | 511500.488 | 511500.488 |
1398 JSON::XS/2 | 291271.111 | 388361.481 |
1399 JSON::XS/3 | 361577.931 | 361577.931 |
1400 Storable | 66788.280 | 265462.278 |
1401 --------------+------------+------------+
1402
1403 That is, JSON::XS is almost six times faster than JSON::DWIW on
1404 encoding, about five times faster on decoding, and over thirty to
1405 seventy times faster than JSON's pure perl implementation. It also
1406 compares favourably to Storable for small amounts of data.
1407
1408 Using a longer test string (roughly 18KB, generated from Yahoo! Locals
1409 search API (<http://dist.schmorp.de/misc/json/long.json>).
1410
1411 module | encode | decode |
1412 --------------|------------|------------|
1413 JSON::DWIW/DS | 1647.927 | 2673.916 |
1414 JSON::DWIW/FJ | 1630.249 | 2596.128 |
1415 JSON::PP | 400.640 | 62.311 |
1416 JSON::Syck | 1481.040 | 1524.869 |
1417 JSON::XS | 20661.596 | 9541.183 |
1418 JSON::XS/2 | 10683.403 | 9416.938 |
1419 JSON::XS/3 | 20661.596 | 9400.054 |
1420 Storable | 19765.806 | 10000.725 |
1421 --------------+------------+------------+
1422
1423 Again, JSON::XS leads by far (except for Storable which non-
1424 surprisingly decodes a bit faster).
1425
1426 On large strings containing lots of high Unicode characters, some
1427 modules (such as JSON::PC) seem to decode faster than JSON::XS, but the
1428 result will be broken due to missing (or wrong) Unicode handling.
1429 Others refuse to decode or encode properly, so it was impossible to
1430 prepare a fair comparison table for that case.
1431
1433 When you are using JSON in a protocol, talking to untrusted potentially
1434 hostile creatures requires relatively few measures.
1435
1436 First of all, your JSON decoder should be secure, that is, should not
1437 have any buffer overflows. Obviously, this module should ensure that
1438 and I am trying hard on making that true, but you never know.
1439
1440 Second, you need to avoid resource-starving attacks. That means you
1441 should limit the size of JSON texts you accept, or make sure then when
1442 your resources run out, that's just fine (e.g. by using a separate
1443 process that can crash safely). The size of a JSON text in octets or
1444 characters is usually a good indication of the size of the resources
1445 required to decode it into a Perl structure. While JSON::XS can check
1446 the size of the JSON text, it might be too late when you already have
1447 it in memory, so you might want to check the size before you accept the
1448 string.
1449
1450 Third, JSON::XS recurses using the C stack when decoding objects and
1451 arrays. The C stack is a limited resource: for instance, on my amd64
1452 machine with 8MB of stack size I can decode around 180k nested arrays
1453 but only 14k nested JSON objects (due to perl itself recursing deeply
1454 on croak to free the temporary). If that is exceeded, the program
1455 crashes. To be conservative, the default nesting limit is set to 512.
1456 If your process has a smaller stack, you should adjust this setting
1457 accordingly with the "max_depth" method.
1458
1459 Something else could bomb you, too, that I forgot to think of. In that
1460 case, you get to keep the pieces. I am always open for hints, though...
1461
1462 Also keep in mind that JSON::XS might leak contents of your Perl data
1463 structures in its error messages, so when you serialise sensitive
1464 information you might want to make sure that exceptions thrown by
1465 JSON::XS will not end up in front of untrusted eyes.
1466
1467 If you are using JSON::XS to return packets to consumption by
1468 JavaScript scripts in a browser you should have a look at
1469 <http://blog.archive.jpsykes.com/47/practical-csrf-and-json-security/>
1470 to see whether you are vulnerable to some common attack vectors (which
1471 really are browser design bugs, but it is still you who will have to
1472 deal with it, as major browser developers care only for features, not
1473 about getting security right).
1474
1475 "OLD" VS. "NEW" JSON (RFC4627 VS. RFC7159)
1476 JSON originally required JSON texts to represent an array or object -
1477 scalar values were explicitly not allowed. This has changed, and
1478 versions of JSON::XS beginning with 4.0 reflect this by allowing scalar
1479 values by default.
1480
1481 One reason why one might not want this is that this removes a
1482 fundamental property of JSON texts, namely that they are self-delimited
1483 and self-contained, or in other words, you could take any number of
1484 "old" JSON texts and paste them together, and the result would be
1485 unambiguously parseable:
1486
1487 [1,3]{"k":5}[][null] # four JSON texts, without doubt
1488
1489 By allowing scalars, this property is lost: in the following example,
1490 is this one JSON text (the number 12) or two JSON texts (the numbers 1
1491 and 2):
1492
1493 12 # could be 12, or 1 and 2
1494
1495 Another lost property of "old" JSON is that no lookahead is required to
1496 know the end of a JSON text, i.e. the JSON text definitely ended at the
1497 last "]" or "}" character, there was no need to read extra characters.
1498
1499 For example, a viable network protocol with "old" JSON was to simply
1500 exchange JSON texts without delimiter. For "new" JSON, you have to use
1501 a suitable delimiter (such as a newline) after every JSON text or
1502 ensure you never encode/decode scalar values.
1503
1504 Most protocols do work by only transferring arrays or objects, and the
1505 easiest way to avoid problems with the "new" JSON definition is to
1506 explicitly disallow scalar values in your encoder and decoder:
1507
1508 $json_coder = JSON::XS->new->allow_nonref (0)
1509
1510 This is a somewhat unhappy situation, and the blame can fully be put on
1511 JSON's inmventor, Douglas Crockford, who unilaterally changed the
1512 format in 2006 without consulting the IETF, forcing the IETF to either
1513 fork the format or go with it (as I was told, the IETF wasn't amused).
1514
1516 JSON is a somewhat sloppily-defined format - it carries around obvious
1517 Javascript baggage, such as not really defining number range, probably
1518 because Javascript only has one type of numbers: IEEE 64 bit floats
1519 ("binary64").
1520
1521 For this reaosn, RFC7493 defines "Internet JSON", which is a restricted
1522 subset of JSON that is supposedly more interoperable on the internet.
1523
1524 While "JSON::XS" does not offer specific support for I-JSON, it of
1525 course accepts valid I-JSON and by default implements some of the
1526 limitations of I-JSON, such as parsing numbers as perl numbers, which
1527 are usually a superset of binary64 numbers.
1528
1529 To generate I-JSON, follow these rules:
1530
1531 • always generate UTF-8
1532
1533 I-JSON must be encoded in UTF-8, the default for "encode_json".
1534
1535 • numbers should be within IEEE 754 binary64 range
1536
1537 Basically all existing perl installations use binary64 to represent
1538 floating point numbers, so all you need to do is to avoid large
1539 integers.
1540
1541 • objects must not have duplicate keys
1542
1543 This is trivially done, as "JSON::XS" does not allow duplicate
1544 keys.
1545
1546 • do not generate scalar JSON texts, use "->allow_nonref (0)"
1547
1548 I-JSON strongly requests you to only encode arrays and objects into
1549 JSON.
1550
1551 • times should be strings in ISO 8601 format
1552
1553 There are a myriad of modules on CPAN dealing with ISO 8601 -
1554 search for "ISO8601" on CPAN and use one.
1555
1556 • encode binary data as base64
1557
1558 While it's tempting to just dump binary data as a string (and let
1559 "JSON::XS" do the escaping), for I-JSON, it's recommended to encode
1560 binary data as base64.
1561
1562 There are some other considerations - read RFC7493 for the details if
1563 interested.
1564
1566 "JSON::XS" uses the Types::Serialiser module to provide boolean
1567 constants. That means that the JSON true and false values will be
1568 comaptible to true and false values of other modules that do the same,
1569 such as JSON::PP and CBOR::XS.
1570
1572 As long as you only serialise data that can be directly expressed in
1573 JSON, "JSON::XS" is incapable of generating invalid JSON output (modulo
1574 bugs, but "JSON::XS" has found more bugs in the official JSON testsuite
1575 (1) than the official JSON testsuite has found in "JSON::XS" (0)).
1576
1577 When you have trouble decoding JSON generated by this module using
1578 other decoders, then it is very likely that you have an encoding
1579 mismatch or the other decoder is broken.
1580
1581 When decoding, "JSON::XS" is strict by default and will likely catch
1582 all errors. There are currently two settings that change this:
1583 "relaxed" makes "JSON::XS" accept (but not generate) some non-standard
1584 extensions, and "allow_tags" will allow you to encode and decode Perl
1585 objects, at the cost of not outputting valid JSON anymore.
1586
1587 TAGGED VALUE SYNTAX AND STANDARD JSON EN/DECODERS
1588 When you use "allow_tags" to use the extended (and also nonstandard and
1589 invalid) JSON syntax for serialised objects, and you still want to
1590 decode the generated When you want to serialise objects, you can run a
1591 regex to replace the tagged syntax by standard JSON arrays (it only
1592 works for "normal" package names without comma, newlines or single
1593 colons). First, the readable Perl version:
1594
1595 # if your FREEZE methods return no values, you need this replace first:
1596 $json =~ s/\( \s* (" (?: [^\\":,]+|\\.|::)* ") \s* \) \s* \[\s*\]/[$1]/gx;
1597
1598 # this works for non-empty constructor arg lists:
1599 $json =~ s/\( \s* (" (?: [^\\":,]+|\\.|::)* ") \s* \) \s* \[/[$1,/gx;
1600
1601 And here is a less readable version that is easy to adapt to other
1602 languages:
1603
1604 $json =~ s/\(\s*("([^\\":,]+|\\.|::)*")\s*\)\s*\[/[$1,/g;
1605
1606 Here is an ECMAScript version (same regex):
1607
1608 json = json.replace (/\(\s*("([^\\":,]+|\\.|::)*")\s*\)\s*\[/g, "[$1,");
1609
1610 Since this syntax converts to standard JSON arrays, it might be hard to
1611 distinguish serialised objects from normal arrays. You can prepend a
1612 "magic number" as first array element to reduce chances of a collision:
1613
1614 $json =~ s/\(\s*("([^\\":,]+|\\.|::)*")\s*\)\s*\[/["XU1peReLzT4ggEllLanBYq4G9VzliwKF",$1,/g;
1615
1616 And after decoding the JSON text, you could walk the data structure
1617 looking for arrays with a first element of
1618 "XU1peReLzT4ggEllLanBYq4G9VzliwKF".
1619
1620 The same approach can be used to create the tagged format with another
1621 encoder. First, you create an array with the magic string as first
1622 member, the classname as second, and constructor arguments last, encode
1623 it as part of your JSON structure, and then:
1624
1625 $json =~ s/\[\s*"XU1peReLzT4ggEllLanBYq4G9VzliwKF"\s*,\s*("([^\\":,]+|\\.|::)*")\s*,/($1)[/g;
1626
1627 Again, this has some limitations - the magic string must not be encoded
1628 with character escapes, and the constructor arguments must be non-
1629 empty.
1630
1632 This module is not guaranteed to be ithread (or MULTIPLICITY-) safe and
1633 there are no plans to change this. Note that perl's builtin so-called
1634 threads/ithreads are officially deprecated and should not be used.
1635
1637 Sometimes people avoid the Perl locale support and directly call the
1638 system's setlocale function with "LC_ALL".
1639
1640 This breaks both perl and modules such as JSON::XS, as stringification
1641 of numbers no longer works correctly (e.g. "$x = 0.1; print "$x"+1"
1642 might print 1, and JSON::XS might output illegal JSON as JSON::XS
1643 relies on perl to stringify numbers).
1644
1645 The solution is simple: don't call "setlocale", or use it for only
1646 those categories you need, such as "LC_MESSAGES" or "LC_CTYPE".
1647
1648 If you need "LC_NUMERIC", you should enable it only around the code
1649 that actually needs it (avoiding stringification of numbers), and
1650 restore it afterwards.
1651
1653 At the time this module was created there already were a number of JSON
1654 modules available on CPAN, so what was the reason to write yet another
1655 JSON module? While it seems there are many JSON modules, none of them
1656 correctly handled all corner cases, and in most cases their maintainers
1657 are unresponsive, gone missing, or not listening to bug reports for
1658 other reasons.
1659
1660 Beginning with version 2.0 of the JSON module, when both JSON and
1661 JSON::XS are installed, then JSON will fall back on JSON::XS (this can
1662 be overridden) with no overhead due to emulation (by inheriting
1663 constructor and methods). If JSON::XS is not available, it will fall
1664 back to the compatible JSON::PP module as backend, so using JSON
1665 instead of JSON::XS gives you a portable JSON API that can be fast when
1666 you need it and doesn't require a C compiler when that is a problem.
1667
1668 Somewhere around version 3, this module was forked into
1669 "Cpanel::JSON::XS", because its maintainer had serious trouble
1670 understanding JSON and insisted on a fork with many bugs "fixed" that
1671 weren't actually bugs, while spreading FUD about this module without
1672 actually giving any details on his accusations. You be the judge, but
1673 in my personal opinion, if you want quality, you will stay away from
1674 dangerous forks like that.
1675
1677 While the goal of this module is to be correct, that unfortunately does
1678 not mean it's bug-free, only that I think its design is bug-free. If
1679 you keep reporting bugs they will be fixed swiftly, though.
1680
1681 Please refrain from using rt.cpan.org or any other bug reporting
1682 service. I put the contact address into my modules for a reason.
1683
1685 The json_xs command line utility for quick experiments.
1686
1688 Marc Lehmann <schmorp@schmorp.de>
1689 http://home.schmorp.de/
1690
1691
1692
1693perl v5.32.1 2021-02-22 XS(3)