1XML::Simple(3) User Contributed Perl Documentation XML::Simple(3)
2
3
4
6 XML::Simple - An API for simple XML files
7
9 PLEASE DO NOT USE THIS MODULE IN NEW CODE. If you ignore this warning
10 and use it anyway, the qw(:strict) mode will save you a little pain.
11
12 use XML::Simple qw(:strict);
13
14 my $ref = XMLin([<xml file or string>] [, <options>]);
15
16 my $xml = XMLout($hashref [, <options>]);
17
18 Or the object oriented way:
19
20 require XML::Simple qw(:strict);
21
22 my $xs = XML::Simple->new([<options>]);
23
24 my $ref = $xs->XMLin([<xml file or string>] [, <options>]);
25
26 my $xml = $xs->XMLout($hashref [, <options>]);
27
28 (or see "SAX SUPPORT" for 'the SAX way').
29
30 Note, in these examples, the square brackets are used to denote
31 optional items not to imply items should be supplied in arrayrefs.
32
34 The use of this module in new code is strongly discouraged. Other
35 modules are available which provide more straightforward and consistent
36 interfaces. In particular, XML::LibXML is highly recommended and you
37 can refer to Perl XML::LibXML by Example <http://grantm.github.io/perl-
38 libxml-by-example/> for a tutorial introduction.
39
40 XML::Twig is another excellent alternative.
41
42 The major problems with this module are the large number of options
43 (some of which have unfortunate defaults) and the arbitrary ways in
44 which these options interact - often producing unexpected results.
45
46 Patches with bug fixes and documentation fixes are welcome, but new
47 features are unlikely to be added.
48
50 Say you have a script called foo and a file of configuration options
51 called foo.xml containing the following:
52
53 <config logdir="/var/log/foo/" debugfile="/tmp/foo.debug">
54 <server name="sahara" osname="solaris" osversion="2.6">
55 <address>10.0.0.101</address>
56 <address>10.0.1.101</address>
57 </server>
58 <server name="gobi" osname="irix" osversion="6.5">
59 <address>10.0.0.102</address>
60 </server>
61 <server name="kalahari" osname="linux" osversion="2.0.34">
62 <address>10.0.0.103</address>
63 <address>10.0.1.103</address>
64 </server>
65 </config>
66
67 The following lines of code in foo:
68
69 use XML::Simple qw(:strict);
70
71 my $config = XMLin(undef, KeyAttr => { server => 'name' }, ForceArray => [ 'server', 'address' ]);
72
73 will 'slurp' the configuration options into the hashref $config
74 (because no filename or XML string was passed as the first argument to
75 XMLin() the name and location of the XML file will be inferred from
76 name and location of the script). You can dump out the contents of the
77 hashref using Data::Dumper:
78
79 use Data::Dumper;
80
81 print Dumper($config);
82
83 which will produce something like this (formatting has been adjusted
84 for brevity):
85
86 {
87 'logdir' => '/var/log/foo/',
88 'debugfile' => '/tmp/foo.debug',
89 'server' => {
90 'sahara' => {
91 'osversion' => '2.6',
92 'osname' => 'solaris',
93 'address' => [ '10.0.0.101', '10.0.1.101' ]
94 },
95 'gobi' => {
96 'osversion' => '6.5',
97 'osname' => 'irix',
98 'address' => [ '10.0.0.102' ]
99 },
100 'kalahari' => {
101 'osversion' => '2.0.34',
102 'osname' => 'linux',
103 'address' => [ '10.0.0.103', '10.0.1.103' ]
104 }
105 }
106 }
107
108 Your script could then access the name of the log directory like this:
109
110 print $config->{logdir};
111
112 similarly, the second address on the server 'kalahari' could be
113 referenced as:
114
115 print $config->{server}->{kalahari}->{address}->[1];
116
117 Note: If the mapping between the output of Data::Dumper and the print
118 statements above is not obvious to you, then please refer to the
119 'references' tutorial (AKA: "Mark's very short tutorial about
120 references") at perlreftut.
121
122 In this example, the "ForceArray" option was used to list elements that
123 might occur multiple times and should therefore be represented as
124 arrayrefs (even when only one element is present).
125
126 The "KeyAttr" option was used to indicate that each "<server>" element
127 has a unique identifier in the "name" attribute. This allows you to
128 index directly to a particular server record using the name as a hash
129 key (as shown above).
130
131 For simple requirements, that's really all there is to it. If you want
132 to store your XML in a different directory or file, or pass it in as a
133 string or even pass it in via some derivative of an IO::Handle, you'll
134 need to check out "OPTIONS". If you want to turn off or tweak the
135 array folding feature (that neat little transformation that produced
136 $config->{server}) you'll find options for that as well.
137
138 If you want to generate XML (for example to write a modified version of
139 $config back out as XML), check out XMLout().
140
141 If your needs are not so simple, this may not be the module for you.
142 In that case, you might want to read "WHERE TO FROM HERE?".
143
145 The XML::Simple module provides a simple API layer on top of an
146 underlying XML parsing module (either XML::Parser or one of the SAX2
147 parser modules). Two functions are exported: XMLin() and XMLout().
148 Note: you can explicitly request the lower case versions of the
149 function names: xml_in() and xml_out().
150
151 The simplest approach is to call these two functions directly, but an
152 optional object oriented interface (see "OPTIONAL OO INTERFACE" below)
153 allows them to be called as methods of an XML::Simple object. The
154 object interface can also be used at either end of a SAX pipeline.
155
156 XMLin()
157 Parses XML formatted data and returns a reference to a data structure
158 which contains the same information in a more readily accessible form.
159 (Skip down to "EXAMPLES" below, for more sample code).
160
161 XMLin() accepts an optional XML specifier followed by zero or more
162 'name => value' option pairs. The XML specifier can be one of the
163 following:
164
165 A filename
166 If the filename contains no directory components XMLin() will look
167 for the file in each directory in the SearchPath (see "OPTIONS"
168 below) or in the current directory if the SearchPath option is not
169 defined. eg:
170
171 $ref = XMLin('/etc/params.xml');
172
173 Note, the filename '-' can be used to parse from STDIN.
174
175 undef
176 If there is no XML specifier, XMLin() will check the script
177 directory and each of the SearchPath directories for a file with
178 the same name as the script but with the extension '.xml'. Note:
179 if you wish to specify options, you must specify the value 'undef'.
180 eg:
181
182 $ref = XMLin(undef, ForceArray => 1);
183
184 A string of XML
185 A string containing XML (recognised by the presence of '<' and '>'
186 characters) will be parsed directly. eg:
187
188 $ref = XMLin('<opt username="bob" password="flurp" />');
189
190 An IO::Handle object
191 An IO::Handle object will be read to EOF and its contents parsed.
192 eg:
193
194 $fh = IO::File->new('/etc/params.xml');
195 $ref = XMLin($fh);
196
197 XMLout()
198 Takes a data structure (generally a hashref) and returns an XML
199 encoding of that structure. If the resulting XML is parsed using
200 XMLin(), it should return a data structure equivalent to the original
201 (see caveats below).
202
203 The XMLout() function can also be used to output the XML as SAX events
204 see the "Handler" option and "SAX SUPPORT" for more details).
205
206 When translating hashes to XML, hash keys which have a leading '-' will
207 be silently skipped. This is the approved method for marking elements
208 of a data structure which should be ignored by "XMLout". (Note: If
209 these items were not skipped the key names would be emitted as element
210 or attribute names with a leading '-' which would not be valid XML).
211
212 Caveats
213 Some care is required in creating data structures which will be passed
214 to XMLout(). Hash keys from the data structure will be encoded as
215 either XML element names or attribute names. Therefore, you should use
216 hash key names which conform to the relatively strict XML naming rules:
217
218 Names in XML must begin with a letter. The remaining characters may be
219 letters, digits, hyphens (-), underscores (_) or full stops (.). It is
220 also allowable to include one colon (:) in an element name but this
221 should only be used when working with namespaces (XML::Simple can only
222 usefully work with namespaces when teamed with a SAX Parser).
223
224 You can use other punctuation characters in hash values (just not in
225 hash keys) however XML::Simple does not support dumping binary data.
226
227 If you break these rules, the current implementation of XMLout() will
228 simply emit non-compliant XML which will be rejected if you try to read
229 it back in. (A later version of XML::Simple might take a more
230 proactive approach).
231
232 Note also that although you can nest hashes and arrays to arbitrary
233 levels, circular data structures are not supported and will cause
234 XMLout() to die.
235
236 If you wish to 'round-trip' arbitrary data structures from Perl to XML
237 and back to Perl, then you should probably disable array folding (using
238 the KeyAttr option) both with XMLout() and with XMLin(). If you still
239 don't get the expected results, you may prefer to use XML::Dumper which
240 is designed for exactly that purpose.
241
242 Refer to "WHERE TO FROM HERE?" if XMLout() is too simple for your
243 needs.
244
246 XML::Simple supports a number of options (in fact as each release of
247 XML::Simple adds more options, the module's claim to the name 'Simple'
248 becomes increasingly tenuous). If you find yourself repeatedly having
249 to specify the same options, you might like to investigate "OPTIONAL OO
250 INTERFACE" below.
251
252 If you can't be bothered reading the documentation, refer to "STRICT
253 MODE" to automatically catch common mistakes.
254
255 Because there are so many options, it's hard for new users to know
256 which ones are important, so here are the two you really need to know
257 about:
258
259 • check out "ForceArray" because you'll almost certainly want to turn
260 it on
261
262 • make sure you know what the "KeyAttr" option does and what its
263 default value is because it may surprise you otherwise (note in
264 particular that 'KeyAttr' affects both "XMLin" and "XMLout")
265
266 The option name headings below have a trailing 'comment' - a hash
267 followed by two pieces of metadata:
268
269 • Options are marked with 'in' if they are recognised by XMLin() and
270 'out' if they are recognised by XMLout().
271
272 • Each option is also flagged to indicate whether it is:
273
274 'important' - don't use the module until you understand this one
275 'handy' - you can skip this on the first time through
276 'advanced' - you can skip this on the second time through
277 'SAX only' - don't worry about this unless you're using SAX (or
278 alternatively if you need this, you also need SAX)
279 'seldom used' - you'll probably never use this unless you were the
280 person that requested the feature
281
282 The options are listed alphabetically:
283
284 Note: option names are no longer case sensitive so you can use the
285 mixed case versions shown here; all lower case as required by versions
286 2.03 and earlier; or you can add underscores between the words (eg:
287 key_attr).
288
289 AttrIndent => 1 # out - handy
290 When you are using XMLout(), enable this option to have attributes
291 printed one-per-line with sensible indentation rather than all on one
292 line.
293
294 Cache => [ cache schemes ] # in - advanced
295 Because loading the XML::Parser module and parsing an XML file can
296 consume a significant number of CPU cycles, it is often desirable to
297 cache the output of XMLin() for later reuse.
298
299 When parsing from a named file, XML::Simple supports a number of
300 caching schemes. The 'Cache' option may be used to specify one or more
301 schemes (using an anonymous array). Each scheme will be tried in turn
302 in the hope of finding a cached pre-parsed representation of the XML
303 file. If no cached copy is found, the file will be parsed and the
304 first cache scheme in the list will be used to save a copy of the
305 results. The following cache schemes have been implemented:
306
307 storable
308 Utilises Storable.pm to read/write a cache file with the same name
309 as the XML file but with the extension .stor
310
311 memshare
312 When a file is first parsed, a copy of the resulting data structure
313 is retained in memory in the XML::Simple module's namespace.
314 Subsequent calls to parse the same file will return a reference to
315 this structure. This cached version will persist only for the life
316 of the Perl interpreter (which in the case of mod_perl for example,
317 may be some significant time).
318
319 Because each caller receives a reference to the same data
320 structure, a change made by one caller will be visible to all. For
321 this reason, the reference returned should be treated as read-only.
322
323 memcopy
324 This scheme works identically to 'memshare' (above) except that
325 each caller receives a reference to a new data structure which is a
326 copy of the cached version. Copying the data structure will add a
327 little processing overhead, therefore this scheme should only be
328 used where the caller intends to modify the data structure (or
329 wishes to protect itself from others who might). This scheme uses
330 Storable.pm to perform the copy.
331
332 Warning! The memory-based caching schemes compare the timestamp on the
333 file to the time when it was last parsed. If the file is stored on an
334 NFS filesystem (or other network share) and the clock on the file
335 server is not exactly synchronised with the clock where your script is
336 run, updates to the source XML file may appear to be ignored.
337
338 ContentKey => 'keyname' # in+out - seldom used
339 When text content is parsed to a hash value, this option lets you
340 specify a name for the hash key to override the default 'content'. So
341 for example:
342
343 XMLin('<opt one="1">Text</opt>', ContentKey => 'text')
344
345 will parse to:
346
347 { 'one' => 1, 'text' => 'Text' }
348
349 instead of:
350
351 { 'one' => 1, 'content' => 'Text' }
352
353 XMLout() will also honour the value of this option when converting a
354 hashref to XML.
355
356 You can also prefix your selected key name with a '-' character to have
357 XMLin() try a little harder to eliminate unnecessary 'content' keys
358 after array folding. For example:
359
360 XMLin(
361 '<opt><item name="one">First</item><item name="two">Second</item></opt>',
362 KeyAttr => {item => 'name'},
363 ForceArray => [ 'item' ],
364 ContentKey => '-content'
365 )
366
367 will parse to:
368
369 {
370 'item' => {
371 'one' => 'First'
372 'two' => 'Second'
373 }
374 }
375
376 rather than this (without the '-'):
377
378 {
379 'item' => {
380 'one' => { 'content' => 'First' }
381 'two' => { 'content' => 'Second' }
382 }
383 }
384
385 DataHandler => code_ref # in - SAX only
386 When you use an XML::Simple object as a SAX handler, it will return a
387 'simple tree' data structure in the same format as XMLin() would
388 return. If this option is set (to a subroutine reference), then when
389 the tree is built the subroutine will be called and passed two
390 arguments: a reference to the XML::Simple object and a reference to the
391 data tree. The return value from the subroutine will be returned to
392 the SAX driver. (See "SAX SUPPORT" for more details).
393
394 ForceArray => 1 # in - important
395 This option should be set to '1' to force nested elements to be
396 represented as arrays even when there is only one. Eg, with ForceArray
397 enabled, this XML:
398
399 <opt>
400 <name>value</name>
401 </opt>
402
403 would parse to this:
404
405 {
406 'name' => [
407 'value'
408 ]
409 }
410
411 instead of this (the default):
412
413 {
414 'name' => 'value'
415 }
416
417 This option is especially useful if the data structure is likely to be
418 written back out as XML and the default behaviour of rolling single
419 nested elements up into attributes is not desirable.
420
421 If you are using the array folding feature, you should almost certainly
422 enable this option. If you do not, single nested elements will not be
423 parsed to arrays and therefore will not be candidates for folding to a
424 hash. (Given that the default value of 'KeyAttr' enables array
425 folding, the default value of this option should probably also have
426 been enabled too - sorry).
427
428 ForceArray => [ names ] # in - important
429 This alternative (and preferred) form of the 'ForceArray' option allows
430 you to specify a list of element names which should always be forced
431 into an array representation, rather than the 'all or nothing' approach
432 above.
433
434 It is also possible (since version 2.05) to include compiled regular
435 expressions in the list - any element names which match the pattern
436 will be forced to arrays. If the list contains only a single regex,
437 then it is not necessary to enclose it in an arrayref. Eg:
438
439 ForceArray => qr/_list$/
440
441 ForceContent => 1 # in - seldom used
442 When XMLin() parses elements which have text content as well as
443 attributes, the text content must be represented as a hash value rather
444 than a simple scalar. This option allows you to force text content to
445 always parse to a hash value even when there are no attributes. So for
446 example:
447
448 XMLin('<opt><x>text1</x><y a="2">text2</y></opt>', ForceContent => 1)
449
450 will parse to:
451
452 {
453 'x' => { 'content' => 'text1' },
454 'y' => { 'a' => 2, 'content' => 'text2' }
455 }
456
457 instead of:
458
459 {
460 'x' => 'text1',
461 'y' => { 'a' => 2, 'content' => 'text2' }
462 }
463
464 GroupTags => { grouping tag => grouped tag } # in+out - handy
465 You can use this option to eliminate extra levels of indirection in
466 your Perl data structure. For example this XML:
467
468 <opt>
469 <searchpath>
470 <dir>/usr/bin</dir>
471 <dir>/usr/local/bin</dir>
472 <dir>/usr/X11/bin</dir>
473 </searchpath>
474 </opt>
475
476 Would normally be read into a structure like this:
477
478 {
479 searchpath => {
480 dir => [ '/usr/bin', '/usr/local/bin', '/usr/X11/bin' ]
481 }
482 }
483
484 But when read in with the appropriate value for 'GroupTags':
485
486 my $opt = XMLin($xml, GroupTags => { searchpath => 'dir' });
487
488 It will return this simpler structure:
489
490 {
491 searchpath => [ '/usr/bin', '/usr/local/bin', '/usr/X11/bin' ]
492 }
493
494 The grouping element ("<searchpath>" in the example) must not contain
495 any attributes or elements other than the grouped element.
496
497 You can specify multiple 'grouping element' to 'grouped element'
498 mappings in the same hashref. If this option is combined with
499 "KeyAttr", the array folding will occur first and then the grouped
500 element names will be eliminated.
501
502 "XMLout" will also use the grouptag mappings to re-introduce the tags
503 around the grouped elements. Beware though that this will occur in all
504 places that the 'grouping tag' name occurs - you probably don't want to
505 use the same name for elements as well as attributes.
506
507 Handler => object_ref # out - SAX only
508 Use the 'Handler' option to have XMLout() generate SAX events rather
509 than returning a string of XML. For more details see "SAX SUPPORT"
510 below.
511
512 Note: the current implementation of this option generates a string of
513 XML and uses a SAX parser to translate it into SAX events. The normal
514 encoding rules apply here - your data must be UTF8 encoded unless you
515 specify an alternative encoding via the 'XMLDecl' option; and by the
516 time the data reaches the handler object, it will be in UTF8 form
517 regardless of the encoding you supply. A future implementation of this
518 option may generate the events directly.
519
520 KeepRoot => 1 # in+out - handy
521 In its attempt to return a data structure free of superfluous detail
522 and unnecessary levels of indirection, XMLin() normally discards the
523 root element name. Setting the 'KeepRoot' option to '1' will cause the
524 root element name to be retained. So after executing this code:
525
526 $config = XMLin('<config tempdir="/tmp" />', KeepRoot => 1)
527
528 You'll be able to reference the tempdir as
529 "$config->{config}->{tempdir}" instead of the default
530 "$config->{tempdir}".
531
532 Similarly, setting the 'KeepRoot' option to '1' will tell XMLout() that
533 the data structure already contains a root element name and it is not
534 necessary to add another.
535
536 KeyAttr => [ list ] # in+out - important
537 This option controls the 'array folding' feature which translates
538 nested elements from an array to a hash. It also controls the
539 'unfolding' of hashes to arrays.
540
541 For example, this XML:
542
543 <opt>
544 <user login="grep" fullname="Gary R Epstein" />
545 <user login="stty" fullname="Simon T Tyson" />
546 </opt>
547
548 would, by default, parse to this:
549
550 {
551 'user' => [
552 {
553 'login' => 'grep',
554 'fullname' => 'Gary R Epstein'
555 },
556 {
557 'login' => 'stty',
558 'fullname' => 'Simon T Tyson'
559 }
560 ]
561 }
562
563 If the option 'KeyAttr => "login"' were used to specify that the
564 'login' attribute is a key, the same XML would parse to:
565
566 {
567 'user' => {
568 'stty' => {
569 'fullname' => 'Simon T Tyson'
570 },
571 'grep' => {
572 'fullname' => 'Gary R Epstein'
573 }
574 }
575 }
576
577 The key attribute names should be supplied in an arrayref if there is
578 more than one. XMLin() will attempt to match attribute names in the
579 order supplied. XMLout() will use the first attribute name supplied
580 when 'unfolding' a hash into an array.
581
582 Note 1: The default value for 'KeyAttr' is ['name', 'key', 'id']. If
583 you do not want folding on input or unfolding on output you must set
584 this option to an empty list to disable the feature.
585
586 Note 2: If you wish to use this option, you should also enable the
587 "ForceArray" option. Without 'ForceArray', a single nested element
588 will be rolled up into a scalar rather than an array and therefore will
589 not be folded (since only arrays get folded).
590
591 KeyAttr => { list } # in+out - important
592 This alternative (and preferred) method of specifying the key
593 attributes allows more fine grained control over which elements are
594 folded and on which attributes. For example the option 'KeyAttr => {
595 package => 'id' } will cause any package elements to be folded on the
596 'id' attribute. No other elements which have an 'id' attribute will be
597 folded at all.
598
599 Note: XMLin() will generate a warning (or a fatal error in "STRICT
600 MODE") if this syntax is used and an element which does not have the
601 specified key attribute is encountered (eg: a 'package' element without
602 an 'id' attribute, to use the example above). Warnings can be
603 suppressed with the lexical "no warnings;" pragma or "no warnings
604 'XML::Simple';".
605
606 Two further variations are made possible by prefixing a '+' or a '-'
607 character to the attribute name:
608
609 The option 'KeyAttr => { user => "+login" }' will cause this XML:
610
611 <opt>
612 <user login="grep" fullname="Gary R Epstein" />
613 <user login="stty" fullname="Simon T Tyson" />
614 </opt>
615
616 to parse to this data structure:
617
618 {
619 'user' => {
620 'stty' => {
621 'fullname' => 'Simon T Tyson',
622 'login' => 'stty'
623 },
624 'grep' => {
625 'fullname' => 'Gary R Epstein',
626 'login' => 'grep'
627 }
628 }
629 }
630
631 The '+' indicates that the value of the key attribute should be copied
632 rather than moved to the folded hash key.
633
634 A '-' prefix would produce this result:
635
636 {
637 'user' => {
638 'stty' => {
639 'fullname' => 'Simon T Tyson',
640 '-login' => 'stty'
641 },
642 'grep' => {
643 'fullname' => 'Gary R Epstein',
644 '-login' => 'grep'
645 }
646 }
647 }
648
649 As described earlier, "XMLout" will ignore hash keys starting with a
650 '-'.
651
652 NoAttr => 1 # in+out - handy
653 When used with XMLout(), the generated XML will contain no attributes.
654 All hash key/values will be represented as nested elements instead.
655
656 When used with XMLin(), any attributes in the XML will be ignored.
657
658 NoEscape => 1 # out - seldom used
659 By default, XMLout() will translate the characters '<', '>', '&' and
660 '"' to '<', '>', '&' and '"' respectively. Use this
661 option to suppress escaping (presumably because you've already escaped
662 the data in some more sophisticated manner).
663
664 NoIndent => 1 # out - seldom used
665 Set this option to 1 to disable XMLout()'s default 'pretty printing'
666 mode. With this option enabled, the XML output will all be on one line
667 (unless there are newlines in the data) - this may be easier for
668 downstream processing.
669
670 NoSort => 1 # out - seldom used
671 Newer versions of XML::Simple sort elements and attributes
672 alphabetically (*), by default. Enable this option to suppress the
673 sorting - possibly for backwards compatibility.
674
675 * Actually, sorting is alphabetical but 'key' attribute or element
676 names (as in 'KeyAttr') sort first. Also, when a hash of hashes is
677 'unfolded', the elements are sorted alphabetically by the value of the
678 key field.
679
680 NormaliseSpace => 0 | 1 | 2 # in - handy
681 This option controls how whitespace in text content is handled.
682 Recognised values for the option are:
683
684 • 0 = (default) whitespace is passed through unaltered (except of
685 course for the normalisation of whitespace in attribute values
686 which is mandated by the XML recommendation)
687
688 • 1 = whitespace is normalised in any value used as a hash key
689 (normalising means removing leading and trailing whitespace and
690 collapsing sequences of whitespace characters to a single space)
691
692 • 2 = whitespace is normalised in all text content
693
694 Note: you can spell this option with a 'z' if that is more natural for
695 you.
696
697 NSExpand => 1 # in+out handy - SAX only
698 This option controls namespace expansion - the translation of element
699 and attribute names of the form 'prefix:name' to '{uri}name'. For
700 example the element name 'xsl:template' might be expanded to:
701 '{http://www.w3.org/1999/XSL/Transform}template'.
702
703 By default, XMLin() will return element names and attribute names
704 exactly as they appear in the XML. Setting this option to 1 will cause
705 all element and attribute names to be expanded to include their
706 namespace prefix.
707
708 Note: You must be using a SAX parser for this option to work (ie: it
709 does not work with XML::Parser).
710
711 This option also controls whether XMLout() performs the reverse
712 translation from '{uri}name' back to 'prefix:name'. The default is no
713 translation. If your data contains expanded names, you should set this
714 option to 1 otherwise "XMLout" will emit XML which is not well formed.
715
716 Note: You must have the XML::NamespaceSupport module installed if you
717 want XMLout() to translate URIs back to prefixes.
718
719 NumericEscape => 0 | 1 | 2 # out - handy
720 Use this option to have 'high' (non-ASCII) characters in your Perl data
721 structure converted to numeric entities (eg: €) in the XML
722 output. Three levels are possible:
723
724 0 - default: no numeric escaping (OK if you're writing out UTF8)
725
726 1 - only characters above 0xFF are escaped (ie: characters in the
727 0x80-FF range are not escaped), possibly useful with ISO8859-1 output
728
729 2 - all characters above 0x7F are escaped (good for plain ASCII output)
730
731 OutputFile => <file specifier> # out - handy
732 The default behaviour of XMLout() is to return the XML as a string. If
733 you wish to write the XML to a file, simply supply the filename using
734 the 'OutputFile' option.
735
736 This option also accepts an IO handle object - especially useful in
737 Perl 5.8.0 and later for output using an encoding other than UTF-8, eg:
738
739 open my $fh, '>:encoding(iso-8859-1)', $path or die "open($path): $!";
740 XMLout($ref, OutputFile => $fh);
741
742 Note, XML::Simple does not require that the object you pass in to the
743 OutputFile option inherits from IO::Handle - it simply assumes the
744 object supports a "print" method.
745
746 ParserOpts => [ XML::Parser Options ] # in - don't use this
747 Note: This option is now officially deprecated. If you find it useful,
748 email the author with an example of what you use it for. Do not use
749 this option to set the ProtocolEncoding, that's just plain wrong - fix
750 the XML.
751
752 This option allows you to pass parameters to the constructor of the
753 underlying XML::Parser object (which of course assumes you're not using
754 SAX).
755
756 RootName => 'string' # out - handy
757 By default, when XMLout() generates XML, the root element will be named
758 'opt'. This option allows you to specify an alternative name.
759
760 Specifying either undef or the empty string for the RootName option
761 will produce XML with no root elements. In most cases the resulting
762 XML fragment will not be 'well formed' and therefore could not be read
763 back in by XMLin(). Nevertheless, the option has been found to be
764 useful in certain circumstances.
765
766 SearchPath => [ list ] # in - handy
767 If you pass XMLin() a filename, but the filename include no directory
768 component, you can use this option to specify which directories should
769 be searched to locate the file. You might use this option to search
770 first in the user's home directory, then in a global directory such as
771 /etc.
772
773 If a filename is provided to XMLin() but SearchPath is not defined, the
774 file is assumed to be in the current directory.
775
776 If the first parameter to XMLin() is undefined, the default SearchPath
777 will contain only the directory in which the script itself is located.
778 Otherwise the default SearchPath will be empty.
779
780 StrictMode => 1 | 0 # in+out seldom used
781 This option allows you to turn "STRICT MODE" on or off for a particular
782 call, regardless of whether it was enabled at the time XML::Simple was
783 loaded.
784
785 SuppressEmpty => 1 | '' | undef # in+out - handy
786 This option controls what XMLin() should do with empty elements (no
787 attributes and no content). The default behaviour is to represent them
788 as empty hashes. Setting this option to a true value (eg: 1) will
789 cause empty elements to be skipped altogether. Setting the option to
790 'undef' or the empty string will cause empty elements to be represented
791 as the undefined value or the empty string respectively. The latter
792 two alternatives are a little easier to test for in your code than a
793 hash with no keys.
794
795 The option also controls what XMLout() does with undefined values.
796 Setting the option to undef causes undefined values to be output as
797 empty elements (rather than empty attributes), it also suppresses the
798 generation of warnings about undefined values. Setting the option to a
799 true value (eg: 1) causes undefined values to be skipped altogether on
800 output.
801
802 ValueAttr => [ names ] # in - handy
803 Use this option to deal elements which always have a single attribute
804 and no content. Eg:
805
806 <opt>
807 <colour value="red" />
808 <size value="XXL" />
809 </opt>
810
811 Setting "ValueAttr => [ 'value' ]" will cause the above XML to parse
812 to:
813
814 {
815 colour => 'red',
816 size => 'XXL'
817 }
818
819 instead of this (the default):
820
821 {
822 colour => { value => 'red' },
823 size => { value => 'XXL' }
824 }
825
826 Note: This form of the ValueAttr option is not compatible with XMLout()
827 - since the attribute name is discarded at parse time, the original XML
828 cannot be reconstructed.
829
830 ValueAttr => { element => attribute, ... } # in+out - handy
831 This (preferred) form of the ValueAttr option requires you to specify
832 both the element and the attribute names. This is not only safer, it
833 also allows the original XML to be reconstructed by XMLout().
834
835 Note: You probably don't want to use this option and the NoAttr option
836 at the same time.
837
838 Variables => { name => value } # in - handy
839 This option allows variables in the XML to be expanded when the file is
840 read. (there is no facility for putting the variable names back if you
841 regenerate XML using "XMLout").
842
843 A 'variable' is any text of the form "${name}" which occurs in an
844 attribute value or in the text content of an element. If 'name'
845 matches a key in the supplied hashref, "${name}" will be replaced with
846 the corresponding value from the hashref. If no matching key is found,
847 the variable will not be replaced. Names must match the regex:
848 "[\w.]+" (ie: only 'word' characters and dots are allowed).
849
850 VarAttr => 'attr_name' # in - handy
851 In addition to the variables defined using "Variables", this option
852 allows variables to be defined in the XML. A variable definition
853 consists of an element with an attribute called 'attr_name' (the value
854 of the "VarAttr" option). The value of the attribute will be used as
855 the variable name and the text content of the element will be used as
856 the value. A variable defined in this way will override a variable
857 defined using the "Variables" option. For example:
858
859 XMLin( '<opt>
860 <dir name="prefix">/usr/local/apache</dir>
861 <dir name="exec_prefix">${prefix}</dir>
862 <dir name="bindir">${exec_prefix}/bin</dir>
863 </opt>',
864 VarAttr => 'name', ContentKey => '-content'
865 );
866
867 produces the following data structure:
868
869 {
870 dir => {
871 prefix => '/usr/local/apache',
872 exec_prefix => '/usr/local/apache',
873 bindir => '/usr/local/apache/bin',
874 }
875 }
876
877 XMLDecl => 1 or XMLDecl => 'string' # out - handy
878 If you want the output from XMLout() to start with the optional XML
879 declaration, simply set the option to '1'. The default XML declaration
880 is:
881
882 <?xml version='1.0' standalone='yes'?>
883
884 If you want some other string (for example to declare an encoding
885 value), set the value of this option to the complete string you
886 require.
887
889 The procedural interface is both simple and convenient however there
890 are a couple of reasons why you might prefer to use the object oriented
891 (OO) interface:
892
893 • to define a set of default values which should be used on all
894 subsequent calls to XMLin() or XMLout()
895
896 • to override methods in XML::Simple to provide customised behaviour
897
898 The default values for the options described above are unlikely to suit
899 everyone. The OO interface allows you to effectively override
900 XML::Simple's defaults with your preferred values. It works like this:
901
902 First create an XML::Simple parser object with your preferred defaults:
903
904 my $xs = XML::Simple->new(ForceArray => 1, KeepRoot => 1);
905
906 then call XMLin() or XMLout() as a method of that object:
907
908 my $ref = $xs->XMLin($xml);
909 my $xml = $xs->XMLout($ref);
910
911 You can also specify options when you make the method calls and these
912 values will be merged with the values specified when the object was
913 created. Values specified in a method call take precedence.
914
915 Note: when called as methods, the XMLin() and XMLout() routines may be
916 called as xml_in() or xml_out(). The method names are aliased so the
917 only difference is the aesthetics.
918
919 Parsing Methods
920 You can explicitly call one of the following methods rather than rely
921 on the xml_in() method automatically determining whether the target to
922 be parsed is a string, a file or a filehandle:
923
924 parse_string(text)
925 Works exactly like the xml_in() method but assumes the first
926 argument is a string of XML (or a reference to a scalar containing
927 a string of XML).
928
929 parse_file(filename)
930 Works exactly like the xml_in() method but assumes the first
931 argument is the name of a file containing XML.
932
933 parse_fh(file_handle)
934 Works exactly like the xml_in() method but assumes the first
935 argument is a filehandle which can be read to get XML.
936
937 Hook Methods
938 You can make your own class which inherits from XML::Simple and
939 overrides certain behaviours. The following methods may provide useful
940 'hooks' upon which to hang your modified behaviour. You may find other
941 undocumented methods by examining the source, but those may be subject
942 to change in future releases.
943
944 new_xml_parser()
945 This method will be called when a new XML::Parser object must be
946 constructed (either because XML::SAX is not installed or
947 XML::Parser is preferred).
948
949 handle_options(direction, name => value ...)
950 This method will be called when one of the parsing methods or the
951 XMLout() method is called. The initial argument will be a string
952 (either 'in' or 'out') and the remaining arguments will be name
953 value pairs.
954
955 default_config_file()
956 Calculates and returns the name of the file which should be parsed
957 if no filename is passed to XMLin() (default: "$0.xml").
958
959 build_simple_tree(filename, string)
960 Called from XMLin() or any of the parsing methods. Takes either a
961 file name as the first argument or "undef" followed by a 'string'
962 as the second argument. Returns a simple tree data structure. You
963 could override this method to apply your own transformations before
964 the data structure is returned to the caller.
965
966 new_hashref()
967 When the 'simple tree' data structure is being built, this method
968 will be called to create any required anonymous hashrefs.
969
970 sorted_keys(name, hashref)
971 Called when XMLout() is translating a hashref to XML. This routine
972 returns a list of hash keys in the order that the corresponding
973 attributes/elements should appear in the output.
974
975 escape_value(string)
976 Called from XMLout(), takes a string and returns a copy of the
977 string with XML character escaping rules applied.
978
979 escape_attr(string)
980 Called from XMLout(), to handle attribute values. By default, just
981 calls escape_value(), but you can override this method if you want
982 attributes escaped differently than text content.
983
984 numeric_escape(string)
985 Called from escape_value(), to handle non-ASCII characters
986 (depending on the value of the NumericEscape option).
987
988 copy_hash(hashref, extra_key => value, ...)
989 Called from XMLout(), when 'unfolding' a hash of hashes into an
990 array of hashes. You might wish to override this method if you're
991 using tied hashes and don't want them to get untied.
992
993 Cache Methods
994 XML::Simple implements three caching schemes ('storable', 'memshare'
995 and 'memcopy'). You can implement a custom caching scheme by
996 implementing two methods - one for reading from the cache and one for
997 writing to it.
998
999 For example, you might implement a new 'dbm' scheme that stores cached
1000 data structures using the MLDBM module. First, you would add a
1001 cache_read_dbm() method which accepted a filename for use as a lookup
1002 key and returned a data structure on success, or undef on failure.
1003 Then, you would implement a cache_read_dbm() method which accepted a
1004 data structure and a filename.
1005
1006 You would use this caching scheme by specifying the option:
1007
1008 Cache => [ 'dbm' ]
1009
1011 If you import the XML::Simple routines like this:
1012
1013 use XML::Simple qw(:strict);
1014
1015 the following common mistakes will be detected and treated as fatal
1016 errors
1017
1018 • Failing to explicitly set the "KeyAttr" option - if you can't be
1019 bothered reading about this option, turn it off with: KeyAttr => [
1020 ]
1021
1022 • Failing to explicitly set the "ForceArray" option - if you can't be
1023 bothered reading about this option, set it to the safest mode with:
1024 ForceArray => 1
1025
1026 • Setting ForceArray to an array, but failing to list all the
1027 elements from the KeyAttr hash.
1028
1029 • Data error - KeyAttr is set to say { part => 'partnum' } but the
1030 XML contains one or more <part> elements without a 'partnum'
1031 attribute (or nested element). Note: if strict mode is not set but
1032 "use warnings;" is in force, this condition triggers a warning.
1033
1034 • Data error - as above, but non-unique values are present in the key
1035 attribute (eg: more than one <part> element with the same partnum).
1036 This will also trigger a warning if strict mode is not enabled.
1037
1038 • Data error - as above, but value of key attribute (eg: partnum) is
1039 not a scalar string (due to nested elements etc). This will also
1040 trigger a warning if strict mode is not enabled.
1041
1043 From version 1.08_01, XML::Simple includes support for SAX (the Simple
1044 API for XML) - specifically SAX2.
1045
1046 In a typical SAX application, an XML parser (or SAX 'driver') module
1047 generates SAX events (start of element, character data, end of element,
1048 etc) as it parses an XML document and a 'handler' module processes the
1049 events to extract the required data. This simple model allows for some
1050 interesting and powerful possibilities:
1051
1052 • Applications written to the SAX API can extract data from huge XML
1053 documents without the memory overheads of a DOM or tree API.
1054
1055 • The SAX API allows for plug and play interchange of parser modules
1056 without having to change your code to fit a new module's API. A
1057 number of SAX parsers are available with capabilities ranging from
1058 extreme portability to blazing performance.
1059
1060 • A SAX 'filter' module can implement both a handler interface for
1061 receiving data and a generator interface for passing modified data
1062 on to a downstream handler. Filters can be chained together in
1063 'pipelines'.
1064
1065 • One filter module might split a data stream to direct data to two
1066 or more downstream handlers.
1067
1068 • Generating SAX events is not the exclusive preserve of XML parsing
1069 modules. For example, a module might extract data from a
1070 relational database using DBI and pass it on to a SAX pipeline for
1071 filtering and formatting.
1072
1073 XML::Simple can operate at either end of a SAX pipeline. For example,
1074 you can take a data structure in the form of a hashref and pass it into
1075 a SAX pipeline using the 'Handler' option on XMLout():
1076
1077 use XML::Simple;
1078 use Some::SAX::Filter;
1079 use XML::SAX::Writer;
1080
1081 my $ref = {
1082 .... # your data here
1083 };
1084
1085 my $writer = XML::SAX::Writer->new();
1086 my $filter = Some::SAX::Filter->new(Handler => $writer);
1087 my $simple = XML::Simple->new(Handler => $filter);
1088 $simple->XMLout($ref);
1089
1090 You can also put XML::Simple at the opposite end of the pipeline to
1091 take advantage of the simple 'tree' data structure once the relevant
1092 data has been isolated through filtering:
1093
1094 use XML::SAX;
1095 use Some::SAX::Filter;
1096 use XML::Simple;
1097
1098 my $simple = XML::Simple->new(ForceArray => 1, KeyAttr => ['partnum']);
1099 my $filter = Some::SAX::Filter->new(Handler => $simple);
1100 my $parser = XML::SAX::ParserFactory->parser(Handler => $filter);
1101
1102 my $ref = $parser->parse_uri('some_huge_file.xml');
1103
1104 print $ref->{part}->{'555-1234'};
1105
1106 You can build a filter by using an XML::Simple object as a handler and
1107 setting its DataHandler option to point to a routine which takes the
1108 resulting tree, modifies it and sends it off as SAX events to a
1109 downstream handler:
1110
1111 my $writer = XML::SAX::Writer->new();
1112 my $filter = XML::Simple->new(
1113 DataHandler => sub {
1114 my $simple = shift;
1115 my $data = shift;
1116
1117 # Modify $data here
1118
1119 $simple->XMLout($data, Handler => $writer);
1120 }
1121 );
1122 my $parser = XML::SAX::ParserFactory->parser(Handler => $filter);
1123
1124 $parser->parse_uri($filename);
1125
1126 Note: In this last example, the 'Handler' option was specified in the
1127 call to XMLout() but it could also have been specified in the
1128 constructor.
1129
1131 If you don't care which parser module XML::Simple uses then skip this
1132 section entirely (it looks more complicated than it really is).
1133
1134 XML::Simple will default to using a SAX parser if one is available or
1135 XML::Parser if SAX is not available.
1136
1137 You can dictate which parser module is used by setting either the
1138 environment variable 'XML_SIMPLE_PREFERRED_PARSER' or the package
1139 variable $XML::Simple::PREFERRED_PARSER to contain the module name.
1140 The following rules are used:
1141
1142 • The package variable takes precedence over the environment variable
1143 if both are defined. To force XML::Simple to ignore the
1144 environment settings and use its default rules, you can set the
1145 package variable to an empty string.
1146
1147 • If the 'preferred parser' is set to the string 'XML::Parser', then
1148 XML::Parser will be used (or XMLin() will die if XML::Parser is not
1149 installed).
1150
1151 • If the 'preferred parser' is set to some other value, then it is
1152 assumed to be the name of a SAX parser module and is passed to
1153 XML::SAX::ParserFactory. If XML::SAX is not installed, or the
1154 requested parser module is not installed, then XMLin() will die.
1155
1156 • If the 'preferred parser' is not defined at all (the normal default
1157 state), an attempt will be made to load XML::SAX. If XML::SAX is
1158 installed, then a parser module will be selected according to
1159 XML::SAX::ParserFactory's normal rules (which typically means the
1160 last SAX parser installed).
1161
1162 • if the 'preferred parser' is not defined and XML::SAX is not
1163 installed, then XML::Parser will be used. XMLin() will die if
1164 XML::Parser is not installed.
1165
1166 Note: The XML::SAX distribution includes an XML parser written entirely
1167 in Perl. It is very portable but it is not very fast. You should
1168 consider installing XML::LibXML or XML::SAX::Expat if they are
1169 available for your platform.
1170
1172 The XML standard is very clear on the issue of non-compliant documents.
1173 An error in parsing any single element (for example a missing end tag)
1174 must cause the whole document to be rejected. XML::Simple will die
1175 with an appropriate message if it encounters a parsing error.
1176
1177 If dying is not appropriate for your application, you should arrange to
1178 call XMLin() in an eval block and look for errors in $@. eg:
1179
1180 my $config = eval { XMLin() };
1181 PopUpMessage($@) if($@);
1182
1183 Note, there is a common misconception that use of eval will
1184 significantly slow down a script. While that may be true when the code
1185 being eval'd is in a string, it is not true of code like the sample
1186 above.
1187
1189 When XMLin() reads the following very simple piece of XML:
1190
1191 <opt username="testuser" password="frodo"></opt>
1192
1193 it returns the following data structure:
1194
1195 {
1196 'username' => 'testuser',
1197 'password' => 'frodo'
1198 }
1199
1200 The identical result could have been produced with this alternative
1201 XML:
1202
1203 <opt username="testuser" password="frodo" />
1204
1205 Or this (although see 'ForceArray' option for variations):
1206
1207 <opt>
1208 <username>testuser</username>
1209 <password>frodo</password>
1210 </opt>
1211
1212 Repeated nested elements are represented as anonymous arrays:
1213
1214 <opt>
1215 <person firstname="Joe" lastname="Smith">
1216 <email>joe@smith.com</email>
1217 <email>jsmith@yahoo.com</email>
1218 </person>
1219 <person firstname="Bob" lastname="Smith">
1220 <email>bob@smith.com</email>
1221 </person>
1222 </opt>
1223
1224 {
1225 'person' => [
1226 {
1227 'email' => [
1228 'joe@smith.com',
1229 'jsmith@yahoo.com'
1230 ],
1231 'firstname' => 'Joe',
1232 'lastname' => 'Smith'
1233 },
1234 {
1235 'email' => 'bob@smith.com',
1236 'firstname' => 'Bob',
1237 'lastname' => 'Smith'
1238 }
1239 ]
1240 }
1241
1242 Nested elements with a recognised key attribute are transformed
1243 (folded) from an array into a hash keyed on the value of that attribute
1244 (see the "KeyAttr" option):
1245
1246 <opt>
1247 <person key="jsmith" firstname="Joe" lastname="Smith" />
1248 <person key="tsmith" firstname="Tom" lastname="Smith" />
1249 <person key="jbloggs" firstname="Joe" lastname="Bloggs" />
1250 </opt>
1251
1252 {
1253 'person' => {
1254 'jbloggs' => {
1255 'firstname' => 'Joe',
1256 'lastname' => 'Bloggs'
1257 },
1258 'tsmith' => {
1259 'firstname' => 'Tom',
1260 'lastname' => 'Smith'
1261 },
1262 'jsmith' => {
1263 'firstname' => 'Joe',
1264 'lastname' => 'Smith'
1265 }
1266 }
1267 }
1268
1269 The <anon> tag can be used to form anonymous arrays:
1270
1271 <opt>
1272 <head><anon>Col 1</anon><anon>Col 2</anon><anon>Col 3</anon></head>
1273 <data><anon>R1C1</anon><anon>R1C2</anon><anon>R1C3</anon></data>
1274 <data><anon>R2C1</anon><anon>R2C2</anon><anon>R2C3</anon></data>
1275 <data><anon>R3C1</anon><anon>R3C2</anon><anon>R3C3</anon></data>
1276 </opt>
1277
1278 {
1279 'head' => [
1280 [ 'Col 1', 'Col 2', 'Col 3' ]
1281 ],
1282 'data' => [
1283 [ 'R1C1', 'R1C2', 'R1C3' ],
1284 [ 'R2C1', 'R2C2', 'R2C3' ],
1285 [ 'R3C1', 'R3C2', 'R3C3' ]
1286 ]
1287 }
1288
1289 Anonymous arrays can be nested to arbitrary levels and as a special
1290 case, if the surrounding tags for an XML document contain only an
1291 anonymous array the arrayref will be returned directly rather than the
1292 usual hashref:
1293
1294 <opt>
1295 <anon><anon>Col 1</anon><anon>Col 2</anon></anon>
1296 <anon><anon>R1C1</anon><anon>R1C2</anon></anon>
1297 <anon><anon>R2C1</anon><anon>R2C2</anon></anon>
1298 </opt>
1299
1300 [
1301 [ 'Col 1', 'Col 2' ],
1302 [ 'R1C1', 'R1C2' ],
1303 [ 'R2C1', 'R2C2' ]
1304 ]
1305
1306 Elements which only contain text content will simply be represented as
1307 a scalar. Where an element has both attributes and text content, the
1308 element will be represented as a hashref with the text content in the
1309 'content' key (see the "ContentKey" option):
1310
1311 <opt>
1312 <one>first</one>
1313 <two attr="value">second</two>
1314 </opt>
1315
1316 {
1317 'one' => 'first',
1318 'two' => { 'attr' => 'value', 'content' => 'second' }
1319 }
1320
1321 Mixed content (elements which contain both text content and nested
1322 elements) will be not be represented in a useful way - element order
1323 and significant whitespace will be lost. If you need to work with
1324 mixed content, then XML::Simple is not the right tool for your job -
1325 check out the next section.
1326
1328 XML::Simple is able to present a simple API because it makes some
1329 assumptions on your behalf. These include:
1330
1331 • You're not interested in text content consisting only of whitespace
1332
1333 • You don't mind that when things get slurped into a hash the order
1334 is lost
1335
1336 • You don't want fine-grained control of the formatting of generated
1337 XML
1338
1339 • You would never use a hash key that was not a legal XML element
1340 name
1341
1342 • You don't need help converting between different encodings
1343
1344 In a serious XML project, you'll probably outgrow these assumptions
1345 fairly quickly. This section of the document used to offer some advice
1346 on choosing a more powerful option. That advice has now grown into the
1347 'Perl-XML FAQ' document which you can find at:
1348 <http://perl-xml.sourceforge.net/faq/>
1349
1350 The advice in the FAQ boils down to a quick explanation of tree versus
1351 event based parsers and then recommends:
1352
1353 For event based parsing, use SAX (do not set out to write any new code
1354 for XML::Parser's handler API - it is obsolete).
1355
1356 For tree-based parsing, you could choose between the 'Perlish' approach
1357 of XML::Twig and more standards based DOM implementations - preferably
1358 one with XPath support such as XML::LibXML.
1359
1361 XML::Simple requires either XML::Parser or XML::SAX.
1362
1363 To generate documents with namespaces, XML::NamespaceSupport is
1364 required.
1365
1366 The optional caching functions require Storable.
1367
1368 Answers to Frequently Asked Questions about XML::Simple are bundled
1369 with this distribution as: XML::Simple::FAQ
1370
1372 Copyright 1999-2004 Grant McLean <grantm@cpan.org>
1373
1374 This library is free software; you can redistribute it and/or modify it
1375 under the same terms as Perl itself.
1376
1377
1378
1379perl v5.36.0 2023-01-20 XML::Simple(3)