1XML::Simple(3) User Contributed Perl Documentation XML::Simple(3)
2
3
4
6 XML::Simple - An API for simple XML files
7
9 PLEASE DO NOT USE THIS MODULE IN NEW CODE. If you ignore this warning
10 and use it anyway, the "qw(:strict)" mode will save you a little pain.
11
12 use XML::Simple qw(:strict);
13
14 my $ref = XMLin([<xml file or string>] [, <options>]);
15
16 my $xml = XMLout($hashref [, <options>]);
17
18 Or the object oriented way:
19
20 require XML::Simple qw(:strict);
21
22 my $xs = XML::Simple->new([<options>]);
23
24 my $ref = $xs->XMLin([<xml file or string>] [, <options>]);
25
26 my $xml = $xs->XMLout($hashref [, <options>]);
27
28 (or see "SAX SUPPORT" for 'the SAX way').
29
30 Note, in these examples, the square brackets are used to denote
31 optional items not to imply items should be supplied in arrayrefs.
32
34 The use of this module in new code is strongly discouraged. Other
35 modules are available which provide more straightforward and consistent
36 interfaces. In particular, XML::LibXML is highly recommended and you
37 can refer to Perl XML::LibXML by Example <http://grantm.github.io/perl-
38 libxml-by-example/> for a tutorial introduction.
39
40 XML::Twig is another excellent alternative.
41
42 The major problems with this module are the large number of options
43 (some of which have unfortunate defaults) and the arbitrary ways in
44 which these options interact - often producing unexpected results.
45
46 Patches with bug fixes and documentation fixes are welcome, but new
47 features are unlikely to be added.
48
50 Say you have a script called foo and a file of configuration options
51 called foo.xml containing the following:
52
53 <config logdir="/var/log/foo/" debugfile="/tmp/foo.debug">
54 <server name="sahara" osname="solaris" osversion="2.6">
55 <address>10.0.0.101</address>
56 <address>10.0.1.101</address>
57 </server>
58 <server name="gobi" osname="irix" osversion="6.5">
59 <address>10.0.0.102</address>
60 </server>
61 <server name="kalahari" osname="linux" osversion="2.0.34">
62 <address>10.0.0.103</address>
63 <address>10.0.1.103</address>
64 </server>
65 </config>
66
67 The following lines of code in foo:
68
69 use XML::Simple qw(:strict);
70
71 my $config = XMLin(undef, KeyAttr => { server => 'name' }, ForceArray => [ 'server', 'address' ]);
72
73 will 'slurp' the configuration options into the hashref $config
74 (because no filename or XML string was passed as the first argument to
75 "XMLin()" the name and location of the XML file will be inferred from
76 name and location of the script). You can dump out the contents of the
77 hashref using Data::Dumper:
78
79 use Data::Dumper;
80
81 print Dumper($config);
82
83 which will produce something like this (formatting has been adjusted
84 for brevity):
85
86 {
87 'logdir' => '/var/log/foo/',
88 'debugfile' => '/tmp/foo.debug',
89 'server' => {
90 'sahara' => {
91 'osversion' => '2.6',
92 'osname' => 'solaris',
93 'address' => [ '10.0.0.101', '10.0.1.101' ]
94 },
95 'gobi' => {
96 'osversion' => '6.5',
97 'osname' => 'irix',
98 'address' => [ '10.0.0.102' ]
99 },
100 'kalahari' => {
101 'osversion' => '2.0.34',
102 'osname' => 'linux',
103 'address' => [ '10.0.0.103', '10.0.1.103' ]
104 }
105 }
106 }
107
108 Your script could then access the name of the log directory like this:
109
110 print $config->{logdir};
111
112 similarly, the second address on the server 'kalahari' could be
113 referenced as:
114
115 print $config->{server}->{kalahari}->{address}->[1];
116
117 Note: If the mapping between the output of Data::Dumper and the print
118 statements above is not obvious to you, then please refer to the
119 'references' tutorial (AKA: "Mark's very short tutorial about
120 references") at perlreftut.
121
122 In this example, the "ForceArray" option was used to list elements that
123 might occur multiple times and should therefore be represented as
124 arrayrefs (even when only one element is present).
125
126 The "KeyAttr" option was used to indicate that each "<server>" element
127 has a unique identifier in the "name" attribute. This allows you to
128 index directly to a particular server record using the name as a hash
129 key (as shown above).
130
131 For simple requirements, that's really all there is to it. If you want
132 to store your XML in a different directory or file, or pass it in as a
133 string or even pass it in via some derivative of an IO::Handle, you'll
134 need to check out "OPTIONS". If you want to turn off or tweak the
135 array folding feature (that neat little transformation that produced
136 $config->{server}) you'll find options for that as well.
137
138 If you want to generate XML (for example to write a modified version of
139 $config back out as XML), check out "XMLout()".
140
141 If your needs are not so simple, this may not be the module for you.
142 In that case, you might want to read "WHERE TO FROM HERE?".
143
145 The XML::Simple module provides a simple API layer on top of an
146 underlying XML parsing module (either XML::Parser or one of the SAX2
147 parser modules). Two functions are exported: "XMLin()" and "XMLout()".
148 Note: you can explicitly request the lower case versions of the
149 function names: "xml_in()" and "xml_out()".
150
151 The simplest approach is to call these two functions directly, but an
152 optional object oriented interface (see "OPTIONAL OO INTERFACE" below)
153 allows them to be called as methods of an XML::Simple object. The
154 object interface can also be used at either end of a SAX pipeline.
155
156 XMLin()
157 Parses XML formatted data and returns a reference to a data structure
158 which contains the same information in a more readily accessible form.
159 (Skip down to "EXAMPLES" below, for more sample code).
160
161 "XMLin()" accepts an optional XML specifier followed by zero or more
162 'name => value' option pairs. The XML specifier can be one of the
163 following:
164
165 A filename
166 If the filename contains no directory components "XMLin()" will
167 look for the file in each directory in the SearchPath (see
168 "OPTIONS" below) or in the current directory if the SearchPath
169 option is not defined. eg:
170
171 $ref = XMLin('/etc/params.xml');
172
173 Note, the filename '-' can be used to parse from STDIN.
174
175 undef
176 If there is no XML specifier, "XMLin()" will check the script
177 directory and each of the SearchPath directories for a file with
178 the same name as the script but with the extension '.xml'. Note:
179 if you wish to specify options, you must specify the value 'undef'.
180 eg:
181
182 $ref = XMLin(undef, ForceArray => 1);
183
184 A string of XML
185 A string containing XML (recognised by the presence of '<' and '>'
186 characters) will be parsed directly. eg:
187
188 $ref = XMLin('<opt username="bob" password="flurp" />');
189
190 An IO::Handle object
191 An IO::Handle object will be read to EOF and its contents parsed.
192 eg:
193
194 $fh = IO::File->new('/etc/params.xml');
195 $ref = XMLin($fh);
196
197 XMLout()
198 Takes a data structure (generally a hashref) and returns an XML
199 encoding of that structure. If the resulting XML is parsed using
200 "XMLin()", it should return a data structure equivalent to the original
201 (see caveats below).
202
203 The "XMLout()" function can also be used to output the XML as SAX
204 events see the "Handler" option and "SAX SUPPORT" for more details).
205
206 When translating hashes to XML, hash keys which have a leading '-' will
207 be silently skipped. This is the approved method for marking elements
208 of a data structure which should be ignored by "XMLout". (Note: If
209 these items were not skipped the key names would be emitted as element
210 or attribute names with a leading '-' which would not be valid XML).
211
212 Caveats
213 Some care is required in creating data structures which will be passed
214 to "XMLout()". Hash keys from the data structure will be encoded as
215 either XML element names or attribute names. Therefore, you should use
216 hash key names which conform to the relatively strict XML naming rules:
217
218 Names in XML must begin with a letter. The remaining characters may be
219 letters, digits, hyphens (-), underscores (_) or full stops (.). It is
220 also allowable to include one colon (:) in an element name but this
221 should only be used when working with namespaces (XML::Simple can only
222 usefully work with namespaces when teamed with a SAX Parser).
223
224 You can use other punctuation characters in hash values (just not in
225 hash keys) however XML::Simple does not support dumping binary data.
226
227 If you break these rules, the current implementation of "XMLout()" will
228 simply emit non-compliant XML which will be rejected if you try to read
229 it back in. (A later version of XML::Simple might take a more
230 proactive approach).
231
232 Note also that although you can nest hashes and arrays to arbitrary
233 levels, circular data structures are not supported and will cause
234 "XMLout()" to die.
235
236 If you wish to 'round-trip' arbitrary data structures from Perl to XML
237 and back to Perl, then you should probably disable array folding (using
238 the KeyAttr option) both with "XMLout()" and with "XMLin()". If you
239 still don't get the expected results, you may prefer to use XML::Dumper
240 which is designed for exactly that purpose.
241
242 Refer to "WHERE TO FROM HERE?" if "XMLout()" is too simple for your
243 needs.
244
246 XML::Simple supports a number of options (in fact as each release of
247 XML::Simple adds more options, the module's claim to the name 'Simple'
248 becomes increasingly tenuous). If you find yourself repeatedly having
249 to specify the same options, you might like to investigate "OPTIONAL OO
250 INTERFACE" below.
251
252 If you can't be bothered reading the documentation, refer to "STRICT
253 MODE" to automatically catch common mistakes.
254
255 Because there are so many options, it's hard for new users to know
256 which ones are important, so here are the two you really need to know
257 about:
258
259 • check out "ForceArray" because you'll almost certainly want to turn
260 it on
261
262 • make sure you know what the "KeyAttr" option does and what its
263 default value is because it may surprise you otherwise (note in
264 particular that 'KeyAttr' affects both "XMLin" and "XMLout")
265
266 The option name headings below have a trailing 'comment' - a hash
267 followed by two pieces of metadata:
268
269 • Options are marked with 'in' if they are recognised by "XMLin()"
270 and 'out' if they are recognised by "XMLout()".
271
272 • Each option is also flagged to indicate whether it is:
273
274 'important' - don't use the module until you understand this one
275 'handy' - you can skip this on the first time through
276 'advanced' - you can skip this on the second time through
277 'SAX only' - don't worry about this unless you're using SAX (or
278 alternatively if you need this, you also need SAX)
279 'seldom used' - you'll probably never use this unless you were the
280 person that requested the feature
281
282 The options are listed alphabetically:
283
284 Note: option names are no longer case sensitive so you can use the
285 mixed case versions shown here; all lower case as required by versions
286 2.03 and earlier; or you can add underscores between the words (eg:
287 key_attr).
288
289 AttrIndent => 1 # out - handy
290 When you are using "XMLout()", enable this option to have attributes
291 printed one-per-line with sensible indentation rather than all on one
292 line.
293
294 Cache => [ cache schemes ] # in - advanced
295 Because loading the XML::Parser module and parsing an XML file can
296 consume a significant number of CPU cycles, it is often desirable to
297 cache the output of "XMLin()" for later reuse.
298
299 When parsing from a named file, XML::Simple supports a number of
300 caching schemes. The 'Cache' option may be used to specify one or more
301 schemes (using an anonymous array). Each scheme will be tried in turn
302 in the hope of finding a cached pre-parsed representation of the XML
303 file. If no cached copy is found, the file will be parsed and the
304 first cache scheme in the list will be used to save a copy of the
305 results. The following cache schemes have been implemented:
306
307 storable
308 Utilises Storable.pm to read/write a cache file with the same name
309 as the XML file but with the extension .stor
310
311 memshare
312 When a file is first parsed, a copy of the resulting data structure
313 is retained in memory in the XML::Simple module's namespace.
314 Subsequent calls to parse the same file will return a reference to
315 this structure. This cached version will persist only for the life
316 of the Perl interpreter (which in the case of mod_perl for example,
317 may be some significant time).
318
319 Because each caller receives a reference to the same data
320 structure, a change made by one caller will be visible to all. For
321 this reason, the reference returned should be treated as read-only.
322
323 memcopy
324 This scheme works identically to 'memshare' (above) except that
325 each caller receives a reference to a new data structure which is a
326 copy of the cached version. Copying the data structure will add a
327 little processing overhead, therefore this scheme should only be
328 used where the caller intends to modify the data structure (or
329 wishes to protect itself from others who might). This scheme uses
330 Storable.pm to perform the copy.
331
332 Warning! The memory-based caching schemes compare the timestamp on the
333 file to the time when it was last parsed. If the file is stored on an
334 NFS filesystem (or other network share) and the clock on the file
335 server is not exactly synchronised with the clock where your script is
336 run, updates to the source XML file may appear to be ignored.
337
338 ContentKey => 'keyname' # in+out - seldom used
339 When text content is parsed to a hash value, this option lets you
340 specify a name for the hash key to override the default 'content'. So
341 for example:
342
343 XMLin('<opt one="1">Text</opt>', ContentKey => 'text')
344
345 will parse to:
346
347 { 'one' => 1, 'text' => 'Text' }
348
349 instead of:
350
351 { 'one' => 1, 'content' => 'Text' }
352
353 "XMLout()" will also honour the value of this option when converting a
354 hashref to XML.
355
356 You can also prefix your selected key name with a '-' character to have
357 "XMLin()" try a little harder to eliminate unnecessary 'content' keys
358 after array folding. For example:
359
360 XMLin(
361 '<opt><item name="one">First</item><item name="two">Second</item></opt>',
362 KeyAttr => {item => 'name'},
363 ForceArray => [ 'item' ],
364 ContentKey => '-content'
365 )
366
367 will parse to:
368
369 {
370 'item' => {
371 'one' => 'First'
372 'two' => 'Second'
373 }
374 }
375
376 rather than this (without the '-'):
377
378 {
379 'item' => {
380 'one' => { 'content' => 'First' }
381 'two' => { 'content' => 'Second' }
382 }
383 }
384
385 DataHandler => code_ref # in - SAX only
386 When you use an XML::Simple object as a SAX handler, it will return a
387 'simple tree' data structure in the same format as "XMLin()" would
388 return. If this option is set (to a subroutine reference), then when
389 the tree is built the subroutine will be called and passed two
390 arguments: a reference to the XML::Simple object and a reference to the
391 data tree. The return value from the subroutine will be returned to
392 the SAX driver. (See "SAX SUPPORT" for more details).
393
394 ForceArray => 1 # in - important
395 This option should be set to '1' to force nested elements to be
396 represented as arrays even when there is only one. Eg, with ForceArray
397 enabled, this XML:
398
399 <opt>
400 <name>value</name>
401 </opt>
402
403 would parse to this:
404
405 {
406 'name' => [
407 'value'
408 ]
409 }
410
411 instead of this (the default):
412
413 {
414 'name' => 'value'
415 }
416
417 This option is especially useful if the data structure is likely to be
418 written back out as XML and the default behaviour of rolling single
419 nested elements up into attributes is not desirable.
420
421 If you are using the array folding feature, you should almost certainly
422 enable this option. If you do not, single nested elements will not be
423 parsed to arrays and therefore will not be candidates for folding to a
424 hash. (Given that the default value of 'KeyAttr' enables array
425 folding, the default value of this option should probably also have
426 been enabled too - sorry).
427
428 ForceArray => [ names ] # in - important
429 This alternative (and preferred) form of the 'ForceArray' option allows
430 you to specify a list of element names which should always be forced
431 into an array representation, rather than the 'all or nothing' approach
432 above.
433
434 It is also possible (since version 2.05) to include compiled regular
435 expressions in the list - any element names which match the pattern
436 will be forced to arrays. If the list contains only a single regex,
437 then it is not necessary to enclose it in an arrayref. Eg:
438
439 ForceArray => qr/_list$/
440
441 ForceContent => 1 # in - seldom used
442 When "XMLin()" parses elements which have text content as well as
443 attributes, the text content must be represented as a hash value rather
444 than a simple scalar. This option allows you to force text content to
445 always parse to a hash value even when there are no attributes. So for
446 example:
447
448 XMLin('<opt><x>text1</x><y a="2">text2</y></opt>', ForceContent => 1)
449
450 will parse to:
451
452 {
453 'x' => { 'content' => 'text1' },
454 'y' => { 'a' => 2, 'content' => 'text2' }
455 }
456
457 instead of:
458
459 {
460 'x' => 'text1',
461 'y' => { 'a' => 2, 'content' => 'text2' }
462 }
463
464 GroupTags => { grouping tag => grouped tag } # in+out - handy
465 You can use this option to eliminate extra levels of indirection in
466 your Perl data structure. For example this XML:
467
468 <opt>
469 <searchpath>
470 <dir>/usr/bin</dir>
471 <dir>/usr/local/bin</dir>
472 <dir>/usr/X11/bin</dir>
473 </searchpath>
474 </opt>
475
476 Would normally be read into a structure like this:
477
478 {
479 searchpath => {
480 dir => [ '/usr/bin', '/usr/local/bin', '/usr/X11/bin' ]
481 }
482 }
483
484 But when read in with the appropriate value for 'GroupTags':
485
486 my $opt = XMLin($xml, GroupTags => { searchpath => 'dir' });
487
488 It will return this simpler structure:
489
490 {
491 searchpath => [ '/usr/bin', '/usr/local/bin', '/usr/X11/bin' ]
492 }
493
494 The grouping element ("<searchpath>" in the example) must not contain
495 any attributes or elements other than the grouped element.
496
497 You can specify multiple 'grouping element' to 'grouped element'
498 mappings in the same hashref. If this option is combined with
499 "KeyAttr", the array folding will occur first and then the grouped
500 element names will be eliminated.
501
502 "XMLout" will also use the grouptag mappings to re-introduce the tags
503 around the grouped elements. Beware though that this will occur in all
504 places that the 'grouping tag' name occurs - you probably don't want to
505 use the same name for elements as well as attributes.
506
507 Handler => object_ref # out - SAX only
508 Use the 'Handler' option to have "XMLout()" generate SAX events rather
509 than returning a string of XML. For more details see "SAX SUPPORT"
510 below.
511
512 Note: the current implementation of this option generates a string of
513 XML and uses a SAX parser to translate it into SAX events. The normal
514 encoding rules apply here - your data must be UTF8 encoded unless you
515 specify an alternative encoding via the 'XMLDecl' option; and by the
516 time the data reaches the handler object, it will be in UTF8 form
517 regardless of the encoding you supply. A future implementation of this
518 option may generate the events directly.
519
520 KeepRoot => 1 # in+out - handy
521 In its attempt to return a data structure free of superfluous detail
522 and unnecessary levels of indirection, "XMLin()" normally discards the
523 root element name. Setting the 'KeepRoot' option to '1' will cause the
524 root element name to be retained. So after executing this code:
525
526 $config = XMLin('<config tempdir="/tmp" />', KeepRoot => 1)
527
528 You'll be able to reference the tempdir as
529 "$config->{config}->{tempdir}" instead of the default
530 "$config->{tempdir}".
531
532 Similarly, setting the 'KeepRoot' option to '1' will tell "XMLout()"
533 that the data structure already contains a root element name and it is
534 not necessary to add another.
535
536 KeyAttr => [ list ] # in+out - important
537 This option controls the 'array folding' feature which translates
538 nested elements from an array to a hash. It also controls the
539 'unfolding' of hashes to arrays.
540
541 For example, this XML:
542
543 <opt>
544 <user login="grep" fullname="Gary R Epstein" />
545 <user login="stty" fullname="Simon T Tyson" />
546 </opt>
547
548 would, by default, parse to this:
549
550 {
551 'user' => [
552 {
553 'login' => 'grep',
554 'fullname' => 'Gary R Epstein'
555 },
556 {
557 'login' => 'stty',
558 'fullname' => 'Simon T Tyson'
559 }
560 ]
561 }
562
563 If the option 'KeyAttr => "login"' were used to specify that the
564 'login' attribute is a key, the same XML would parse to:
565
566 {
567 'user' => {
568 'stty' => {
569 'fullname' => 'Simon T Tyson'
570 },
571 'grep' => {
572 'fullname' => 'Gary R Epstein'
573 }
574 }
575 }
576
577 The key attribute names should be supplied in an arrayref if there is
578 more than one. "XMLin()" will attempt to match attribute names in the
579 order supplied. "XMLout()" will use the first attribute name supplied
580 when 'unfolding' a hash into an array.
581
582 Note 1: The default value for 'KeyAttr' is ['name', 'key', 'id']. If
583 you do not want folding on input or unfolding on output you must set
584 this option to an empty list to disable the feature.
585
586 Note 2: If you wish to use this option, you should also enable the
587 "ForceArray" option. Without 'ForceArray', a single nested element
588 will be rolled up into a scalar rather than an array and therefore will
589 not be folded (since only arrays get folded).
590
591 KeyAttr => { list } # in+out - important
592 This alternative (and preferred) method of specifying the key
593 attributes allows more fine grained control over which elements are
594 folded and on which attributes. For example the option 'KeyAttr => {
595 package => 'id' } will cause any package elements to be folded on the
596 'id' attribute. No other elements which have an 'id' attribute will be
597 folded at all.
598
599 Note: "XMLin()" will generate a warning (or a fatal error in "STRICT
600 MODE") if this syntax is used and an element which does not have the
601 specified key attribute is encountered (eg: a 'package' element without
602 an 'id' attribute, to use the example above). Warnings can be
603 suppressed with the lexical "no warnings;" pragma or "no warnings
604 'XML::Simple';".
605
606 Two further variations are made possible by prefixing a '+' or a '-'
607 character to the attribute name:
608
609 The option 'KeyAttr => { user => "+login" }' will cause this XML:
610
611 <opt>
612 <user login="grep" fullname="Gary R Epstein" />
613 <user login="stty" fullname="Simon T Tyson" />
614 </opt>
615
616 to parse to this data structure:
617
618 {
619 'user' => {
620 'stty' => {
621 'fullname' => 'Simon T Tyson',
622 'login' => 'stty'
623 },
624 'grep' => {
625 'fullname' => 'Gary R Epstein',
626 'login' => 'grep'
627 }
628 }
629 }
630
631 The '+' indicates that the value of the key attribute should be copied
632 rather than moved to the folded hash key.
633
634 A '-' prefix would produce this result:
635
636 {
637 'user' => {
638 'stty' => {
639 'fullname' => 'Simon T Tyson',
640 '-login' => 'stty'
641 },
642 'grep' => {
643 'fullname' => 'Gary R Epstein',
644 '-login' => 'grep'
645 }
646 }
647 }
648
649 As described earlier, "XMLout" will ignore hash keys starting with a
650 '-'.
651
652 NoAttr => 1 # in+out - handy
653 When used with "XMLout()", the generated XML will contain no
654 attributes. All hash key/values will be represented as nested elements
655 instead.
656
657 When used with "XMLin()", any attributes in the XML will be ignored.
658
659 NoEscape => 1 # out - seldom used
660 By default, "XMLout()" will translate the characters '<', '>', '&' and
661 '"' to '<', '>', '&' and '"' respectively. Use this
662 option to suppress escaping (presumably because you've already escaped
663 the data in some more sophisticated manner).
664
665 NoIndent => 1 # out - seldom used
666 Set this option to 1 to disable "XMLout()"'s default 'pretty printing'
667 mode. With this option enabled, the XML output will all be on one line
668 (unless there are newlines in the data) - this may be easier for
669 downstream processing.
670
671 NoSort => 1 # out - seldom used
672 Newer versions of XML::Simple sort elements and attributes
673 alphabetically (*), by default. Enable this option to suppress the
674 sorting - possibly for backwards compatibility.
675
676 * Actually, sorting is alphabetical but 'key' attribute or element
677 names (as in 'KeyAttr') sort first. Also, when a hash of hashes is
678 'unfolded', the elements are sorted alphabetically by the value of the
679 key field.
680
681 NormaliseSpace => 0 | 1 | 2 # in - handy
682 This option controls how whitespace in text content is handled.
683 Recognised values for the option are:
684
685 • 0 = (default) whitespace is passed through unaltered (except of
686 course for the normalisation of whitespace in attribute values
687 which is mandated by the XML recommendation)
688
689 • 1 = whitespace is normalised in any value used as a hash key
690 (normalising means removing leading and trailing whitespace and
691 collapsing sequences of whitespace characters to a single space)
692
693 • 2 = whitespace is normalised in all text content
694
695 Note: you can spell this option with a 'z' if that is more natural for
696 you.
697
698 NSExpand => 1 # in+out handy - SAX only
699 This option controls namespace expansion - the translation of element
700 and attribute names of the form 'prefix:name' to '{uri}name'. For
701 example the element name 'xsl:template' might be expanded to:
702 '{http://www.w3.org/1999/XSL/Transform}template'.
703
704 By default, "XMLin()" will return element names and attribute names
705 exactly as they appear in the XML. Setting this option to 1 will cause
706 all element and attribute names to be expanded to include their
707 namespace prefix.
708
709 Note: You must be using a SAX parser for this option to work (ie: it
710 does not work with XML::Parser).
711
712 This option also controls whether "XMLout()" performs the reverse
713 translation from '{uri}name' back to 'prefix:name'. The default is no
714 translation. If your data contains expanded names, you should set this
715 option to 1 otherwise "XMLout" will emit XML which is not well formed.
716
717 Note: You must have the XML::NamespaceSupport module installed if you
718 want "XMLout()" to translate URIs back to prefixes.
719
720 NumericEscape => 0 | 1 | 2 # out - handy
721 Use this option to have 'high' (non-ASCII) characters in your Perl data
722 structure converted to numeric entities (eg: €) in the XML
723 output. Three levels are possible:
724
725 0 - default: no numeric escaping (OK if you're writing out UTF8)
726
727 1 - only characters above 0xFF are escaped (ie: characters in the
728 0x80-FF range are not escaped), possibly useful with ISO8859-1 output
729
730 2 - all characters above 0x7F are escaped (good for plain ASCII output)
731
732 OutputFile => <file specifier> # out - handy
733 The default behaviour of "XMLout()" is to return the XML as a string.
734 If you wish to write the XML to a file, simply supply the filename
735 using the 'OutputFile' option.
736
737 This option also accepts an IO handle object - especially useful in
738 Perl 5.8.0 and later for output using an encoding other than UTF-8, eg:
739
740 open my $fh, '>:encoding(iso-8859-1)', $path or die "open($path): $!";
741 XMLout($ref, OutputFile => $fh);
742
743 Note, XML::Simple does not require that the object you pass in to the
744 OutputFile option inherits from IO::Handle - it simply assumes the
745 object supports a "print" method.
746
747 ParserOpts => [ XML::Parser Options ] # in - don't use this
748 Note: This option is now officially deprecated. If you find it useful,
749 email the author with an example of what you use it for. Do not use
750 this option to set the ProtocolEncoding, that's just plain wrong - fix
751 the XML.
752
753 This option allows you to pass parameters to the constructor of the
754 underlying XML::Parser object (which of course assumes you're not using
755 SAX).
756
757 RootName => 'string' # out - handy
758 By default, when "XMLout()" generates XML, the root element will be
759 named 'opt'. This option allows you to specify an alternative name.
760
761 Specifying either undef or the empty string for the RootName option
762 will produce XML with no root elements. In most cases the resulting
763 XML fragment will not be 'well formed' and therefore could not be read
764 back in by "XMLin()". Nevertheless, the option has been found to be
765 useful in certain circumstances.
766
767 SearchPath => [ list ] # in - handy
768 If you pass "XMLin()" a filename, but the filename include no directory
769 component, you can use this option to specify which directories should
770 be searched to locate the file. You might use this option to search
771 first in the user's home directory, then in a global directory such as
772 /etc.
773
774 If a filename is provided to "XMLin()" but SearchPath is not defined,
775 the file is assumed to be in the current directory.
776
777 If the first parameter to "XMLin()" is undefined, the default
778 SearchPath will contain only the directory in which the script itself
779 is located. Otherwise the default SearchPath will be empty.
780
781 StrictMode => 1 | 0 # in+out seldom used
782 This option allows you to turn "STRICT MODE" on or off for a particular
783 call, regardless of whether it was enabled at the time XML::Simple was
784 loaded.
785
786 SuppressEmpty => 1 | '' | undef # in+out - handy
787 This option controls what "XMLin()" should do with empty elements (no
788 attributes and no content). The default behaviour is to represent them
789 as empty hashes. Setting this option to a true value (eg: 1) will
790 cause empty elements to be skipped altogether. Setting the option to
791 'undef' or the empty string will cause empty elements to be represented
792 as the undefined value or the empty string respectively. The latter
793 two alternatives are a little easier to test for in your code than a
794 hash with no keys.
795
796 The option also controls what "XMLout()" does with undefined values.
797 Setting the option to undef causes undefined values to be output as
798 empty elements (rather than empty attributes), it also suppresses the
799 generation of warnings about undefined values. Setting the option to a
800 true value (eg: 1) causes undefined values to be skipped altogether on
801 output.
802
803 ValueAttr => [ names ] # in - handy
804 Use this option to deal elements which always have a single attribute
805 and no content. Eg:
806
807 <opt>
808 <colour value="red" />
809 <size value="XXL" />
810 </opt>
811
812 Setting "ValueAttr => [ 'value' ]" will cause the above XML to parse
813 to:
814
815 {
816 colour => 'red',
817 size => 'XXL'
818 }
819
820 instead of this (the default):
821
822 {
823 colour => { value => 'red' },
824 size => { value => 'XXL' }
825 }
826
827 Note: This form of the ValueAttr option is not compatible with
828 "XMLout()" - since the attribute name is discarded at parse time, the
829 original XML cannot be reconstructed.
830
831 ValueAttr => { element => attribute, ... } # in+out - handy
832 This (preferred) form of the ValueAttr option requires you to specify
833 both the element and the attribute names. This is not only safer, it
834 also allows the original XML to be reconstructed by "XMLout()".
835
836 Note: You probably don't want to use this option and the NoAttr option
837 at the same time.
838
839 Variables => { name => value } # in - handy
840 This option allows variables in the XML to be expanded when the file is
841 read. (there is no facility for putting the variable names back if you
842 regenerate XML using "XMLout").
843
844 A 'variable' is any text of the form "${name}" which occurs in an
845 attribute value or in the text content of an element. If 'name'
846 matches a key in the supplied hashref, "${name}" will be replaced with
847 the corresponding value from the hashref. If no matching key is found,
848 the variable will not be replaced. Names must match the regex:
849 "[\w.]+" (ie: only 'word' characters and dots are allowed).
850
851 VarAttr => 'attr_name' # in - handy
852 In addition to the variables defined using "Variables", this option
853 allows variables to be defined in the XML. A variable definition
854 consists of an element with an attribute called 'attr_name' (the value
855 of the "VarAttr" option). The value of the attribute will be used as
856 the variable name and the text content of the element will be used as
857 the value. A variable defined in this way will override a variable
858 defined using the "Variables" option. For example:
859
860 XMLin( '<opt>
861 <dir name="prefix">/usr/local/apache</dir>
862 <dir name="exec_prefix">${prefix}</dir>
863 <dir name="bindir">${exec_prefix}/bin</dir>
864 </opt>',
865 VarAttr => 'name', ContentKey => '-content'
866 );
867
868 produces the following data structure:
869
870 {
871 dir => {
872 prefix => '/usr/local/apache',
873 exec_prefix => '/usr/local/apache',
874 bindir => '/usr/local/apache/bin',
875 }
876 }
877
878 XMLDecl => 1 or XMLDecl => 'string' # out - handy
879 If you want the output from "XMLout()" to start with the optional XML
880 declaration, simply set the option to '1'. The default XML declaration
881 is:
882
883 <?xml version='1.0' standalone='yes'?>
884
885 If you want some other string (for example to declare an encoding
886 value), set the value of this option to the complete string you
887 require.
888
890 The procedural interface is both simple and convenient however there
891 are a couple of reasons why you might prefer to use the object oriented
892 (OO) interface:
893
894 • to define a set of default values which should be used on all
895 subsequent calls to "XMLin()" or "XMLout()"
896
897 • to override methods in XML::Simple to provide customised behaviour
898
899 The default values for the options described above are unlikely to suit
900 everyone. The OO interface allows you to effectively override
901 XML::Simple's defaults with your preferred values. It works like this:
902
903 First create an XML::Simple parser object with your preferred defaults:
904
905 my $xs = XML::Simple->new(ForceArray => 1, KeepRoot => 1);
906
907 then call "XMLin()" or "XMLout()" as a method of that object:
908
909 my $ref = $xs->XMLin($xml);
910 my $xml = $xs->XMLout($ref);
911
912 You can also specify options when you make the method calls and these
913 values will be merged with the values specified when the object was
914 created. Values specified in a method call take precedence.
915
916 Note: when called as methods, the "XMLin()" and "XMLout()" routines may
917 be called as "xml_in()" or "xml_out()". The method names are aliased
918 so the only difference is the aesthetics.
919
920 Parsing Methods
921 You can explicitly call one of the following methods rather than rely
922 on the "xml_in()" method automatically determining whether the target
923 to be parsed is a string, a file or a filehandle:
924
925 parse_string(text)
926 Works exactly like the "xml_in()" method but assumes the first
927 argument is a string of XML (or a reference to a scalar containing
928 a string of XML).
929
930 parse_file(filename)
931 Works exactly like the "xml_in()" method but assumes the first
932 argument is the name of a file containing XML.
933
934 parse_fh(file_handle)
935 Works exactly like the "xml_in()" method but assumes the first
936 argument is a filehandle which can be read to get XML.
937
938 Hook Methods
939 You can make your own class which inherits from XML::Simple and
940 overrides certain behaviours. The following methods may provide useful
941 'hooks' upon which to hang your modified behaviour. You may find other
942 undocumented methods by examining the source, but those may be subject
943 to change in future releases.
944
945 new_xml_parser()
946 This method will be called when a new XML::Parser object must be
947 constructed (either because XML::SAX is not installed or
948 XML::Parser is preferred).
949
950 handle_options(direction, name => value ...)
951 This method will be called when one of the parsing methods or the
952 "XMLout()" method is called. The initial argument will be a string
953 (either 'in' or 'out') and the remaining arguments will be name
954 value pairs.
955
956 default_config_file()
957 Calculates and returns the name of the file which should be parsed
958 if no filename is passed to "XMLin()" (default: "$0.xml").
959
960 build_simple_tree(filename, string)
961 Called from "XMLin()" or any of the parsing methods. Takes either
962 a file name as the first argument or "undef" followed by a 'string'
963 as the second argument. Returns a simple tree data structure. You
964 could override this method to apply your own transformations before
965 the data structure is returned to the caller.
966
967 new_hashref()
968 When the 'simple tree' data structure is being built, this method
969 will be called to create any required anonymous hashrefs.
970
971 sorted_keys(name, hashref)
972 Called when "XMLout()" is translating a hashref to XML. This
973 routine returns a list of hash keys in the order that the
974 corresponding attributes/elements should appear in the output.
975
976 escape_value(string)
977 Called from "XMLout()", takes a string and returns a copy of the
978 string with XML character escaping rules applied.
979
980 escape_attr(string)
981 Called from "XMLout()", to handle attribute values. By default,
982 just calls "escape_value()", but you can override this method if
983 you want attributes escaped differently than text content.
984
985 numeric_escape(string)
986 Called from "escape_value()", to handle non-ASCII characters
987 (depending on the value of the NumericEscape option).
988
989 copy_hash(hashref, extra_key => value, ...)
990 Called from "XMLout()", when 'unfolding' a hash of hashes into an
991 array of hashes. You might wish to override this method if you're
992 using tied hashes and don't want them to get untied.
993
994 Cache Methods
995 XML::Simple implements three caching schemes ('storable', 'memshare'
996 and 'memcopy'). You can implement a custom caching scheme by
997 implementing two methods - one for reading from the cache and one for
998 writing to it.
999
1000 For example, you might implement a new 'dbm' scheme that stores cached
1001 data structures using the MLDBM module. First, you would add a
1002 "cache_read_dbm()" method which accepted a filename for use as a lookup
1003 key and returned a data structure on success, or undef on failure.
1004 Then, you would implement a "cache_read_dbm()" method which accepted a
1005 data structure and a filename.
1006
1007 You would use this caching scheme by specifying the option:
1008
1009 Cache => [ 'dbm' ]
1010
1012 If you import the XML::Simple routines like this:
1013
1014 use XML::Simple qw(:strict);
1015
1016 the following common mistakes will be detected and treated as fatal
1017 errors
1018
1019 • Failing to explicitly set the "KeyAttr" option - if you can't be
1020 bothered reading about this option, turn it off with: KeyAttr => [
1021 ]
1022
1023 • Failing to explicitly set the "ForceArray" option - if you can't be
1024 bothered reading about this option, set it to the safest mode with:
1025 ForceArray => 1
1026
1027 • Setting ForceArray to an array, but failing to list all the
1028 elements from the KeyAttr hash.
1029
1030 • Data error - KeyAttr is set to say { part => 'partnum' } but the
1031 XML contains one or more <part> elements without a 'partnum'
1032 attribute (or nested element). Note: if strict mode is not set but
1033 "use warnings;" is in force, this condition triggers a warning.
1034
1035 • Data error - as above, but non-unique values are present in the key
1036 attribute (eg: more than one <part> element with the same partnum).
1037 This will also trigger a warning if strict mode is not enabled.
1038
1039 • Data error - as above, but value of key attribute (eg: partnum) is
1040 not a scalar string (due to nested elements etc). This will also
1041 trigger a warning if strict mode is not enabled.
1042
1044 From version 1.08_01, XML::Simple includes support for SAX (the Simple
1045 API for XML) - specifically SAX2.
1046
1047 In a typical SAX application, an XML parser (or SAX 'driver') module
1048 generates SAX events (start of element, character data, end of element,
1049 etc) as it parses an XML document and a 'handler' module processes the
1050 events to extract the required data. This simple model allows for some
1051 interesting and powerful possibilities:
1052
1053 • Applications written to the SAX API can extract data from huge XML
1054 documents without the memory overheads of a DOM or tree API.
1055
1056 • The SAX API allows for plug and play interchange of parser modules
1057 without having to change your code to fit a new module's API. A
1058 number of SAX parsers are available with capabilities ranging from
1059 extreme portability to blazing performance.
1060
1061 • A SAX 'filter' module can implement both a handler interface for
1062 receiving data and a generator interface for passing modified data
1063 on to a downstream handler. Filters can be chained together in
1064 'pipelines'.
1065
1066 • One filter module might split a data stream to direct data to two
1067 or more downstream handlers.
1068
1069 • Generating SAX events is not the exclusive preserve of XML parsing
1070 modules. For example, a module might extract data from a
1071 relational database using DBI and pass it on to a SAX pipeline for
1072 filtering and formatting.
1073
1074 XML::Simple can operate at either end of a SAX pipeline. For example,
1075 you can take a data structure in the form of a hashref and pass it into
1076 a SAX pipeline using the 'Handler' option on "XMLout()":
1077
1078 use XML::Simple;
1079 use Some::SAX::Filter;
1080 use XML::SAX::Writer;
1081
1082 my $ref = {
1083 .... # your data here
1084 };
1085
1086 my $writer = XML::SAX::Writer->new();
1087 my $filter = Some::SAX::Filter->new(Handler => $writer);
1088 my $simple = XML::Simple->new(Handler => $filter);
1089 $simple->XMLout($ref);
1090
1091 You can also put XML::Simple at the opposite end of the pipeline to
1092 take advantage of the simple 'tree' data structure once the relevant
1093 data has been isolated through filtering:
1094
1095 use XML::SAX;
1096 use Some::SAX::Filter;
1097 use XML::Simple;
1098
1099 my $simple = XML::Simple->new(ForceArray => 1, KeyAttr => ['partnum']);
1100 my $filter = Some::SAX::Filter->new(Handler => $simple);
1101 my $parser = XML::SAX::ParserFactory->parser(Handler => $filter);
1102
1103 my $ref = $parser->parse_uri('some_huge_file.xml');
1104
1105 print $ref->{part}->{'555-1234'};
1106
1107 You can build a filter by using an XML::Simple object as a handler and
1108 setting its DataHandler option to point to a routine which takes the
1109 resulting tree, modifies it and sends it off as SAX events to a
1110 downstream handler:
1111
1112 my $writer = XML::SAX::Writer->new();
1113 my $filter = XML::Simple->new(
1114 DataHandler => sub {
1115 my $simple = shift;
1116 my $data = shift;
1117
1118 # Modify $data here
1119
1120 $simple->XMLout($data, Handler => $writer);
1121 }
1122 );
1123 my $parser = XML::SAX::ParserFactory->parser(Handler => $filter);
1124
1125 $parser->parse_uri($filename);
1126
1127 Note: In this last example, the 'Handler' option was specified in the
1128 call to "XMLout()" but it could also have been specified in the
1129 constructor.
1130
1132 If you don't care which parser module XML::Simple uses then skip this
1133 section entirely (it looks more complicated than it really is).
1134
1135 XML::Simple will default to using a SAX parser if one is available or
1136 XML::Parser if SAX is not available.
1137
1138 You can dictate which parser module is used by setting either the
1139 environment variable 'XML_SIMPLE_PREFERRED_PARSER' or the package
1140 variable $XML::Simple::PREFERRED_PARSER to contain the module name.
1141 The following rules are used:
1142
1143 • The package variable takes precedence over the environment variable
1144 if both are defined. To force XML::Simple to ignore the
1145 environment settings and use its default rules, you can set the
1146 package variable to an empty string.
1147
1148 • If the 'preferred parser' is set to the string 'XML::Parser', then
1149 XML::Parser will be used (or "XMLin()" will die if XML::Parser is
1150 not installed).
1151
1152 • If the 'preferred parser' is set to some other value, then it is
1153 assumed to be the name of a SAX parser module and is passed to
1154 XML::SAX::ParserFactory. If XML::SAX is not installed, or the
1155 requested parser module is not installed, then "XMLin()" will die.
1156
1157 • If the 'preferred parser' is not defined at all (the normal default
1158 state), an attempt will be made to load XML::SAX. If XML::SAX is
1159 installed, then a parser module will be selected according to
1160 XML::SAX::ParserFactory's normal rules (which typically means the
1161 last SAX parser installed).
1162
1163 • if the 'preferred parser' is not defined and XML::SAX is not
1164 installed, then XML::Parser will be used. "XMLin()" will die if
1165 XML::Parser is not installed.
1166
1167 Note: The XML::SAX distribution includes an XML parser written entirely
1168 in Perl. It is very portable but it is not very fast. You should
1169 consider installing XML::LibXML or XML::SAX::Expat if they are
1170 available for your platform.
1171
1173 The XML standard is very clear on the issue of non-compliant documents.
1174 An error in parsing any single element (for example a missing end tag)
1175 must cause the whole document to be rejected. XML::Simple will die
1176 with an appropriate message if it encounters a parsing error.
1177
1178 If dying is not appropriate for your application, you should arrange to
1179 call "XMLin()" in an eval block and look for errors in $@. eg:
1180
1181 my $config = eval { XMLin() };
1182 PopUpMessage($@) if($@);
1183
1184 Note, there is a common misconception that use of eval will
1185 significantly slow down a script. While that may be true when the code
1186 being eval'd is in a string, it is not true of code like the sample
1187 above.
1188
1190 When "XMLin()" reads the following very simple piece of XML:
1191
1192 <opt username="testuser" password="frodo"></opt>
1193
1194 it returns the following data structure:
1195
1196 {
1197 'username' => 'testuser',
1198 'password' => 'frodo'
1199 }
1200
1201 The identical result could have been produced with this alternative
1202 XML:
1203
1204 <opt username="testuser" password="frodo" />
1205
1206 Or this (although see 'ForceArray' option for variations):
1207
1208 <opt>
1209 <username>testuser</username>
1210 <password>frodo</password>
1211 </opt>
1212
1213 Repeated nested elements are represented as anonymous arrays:
1214
1215 <opt>
1216 <person firstname="Joe" lastname="Smith">
1217 <email>joe@smith.com</email>
1218 <email>jsmith@yahoo.com</email>
1219 </person>
1220 <person firstname="Bob" lastname="Smith">
1221 <email>bob@smith.com</email>
1222 </person>
1223 </opt>
1224
1225 {
1226 'person' => [
1227 {
1228 'email' => [
1229 'joe@smith.com',
1230 'jsmith@yahoo.com'
1231 ],
1232 'firstname' => 'Joe',
1233 'lastname' => 'Smith'
1234 },
1235 {
1236 'email' => 'bob@smith.com',
1237 'firstname' => 'Bob',
1238 'lastname' => 'Smith'
1239 }
1240 ]
1241 }
1242
1243 Nested elements with a recognised key attribute are transformed
1244 (folded) from an array into a hash keyed on the value of that attribute
1245 (see the "KeyAttr" option):
1246
1247 <opt>
1248 <person key="jsmith" firstname="Joe" lastname="Smith" />
1249 <person key="tsmith" firstname="Tom" lastname="Smith" />
1250 <person key="jbloggs" firstname="Joe" lastname="Bloggs" />
1251 </opt>
1252
1253 {
1254 'person' => {
1255 'jbloggs' => {
1256 'firstname' => 'Joe',
1257 'lastname' => 'Bloggs'
1258 },
1259 'tsmith' => {
1260 'firstname' => 'Tom',
1261 'lastname' => 'Smith'
1262 },
1263 'jsmith' => {
1264 'firstname' => 'Joe',
1265 'lastname' => 'Smith'
1266 }
1267 }
1268 }
1269
1270 The <anon> tag can be used to form anonymous arrays:
1271
1272 <opt>
1273 <head><anon>Col 1</anon><anon>Col 2</anon><anon>Col 3</anon></head>
1274 <data><anon>R1C1</anon><anon>R1C2</anon><anon>R1C3</anon></data>
1275 <data><anon>R2C1</anon><anon>R2C2</anon><anon>R2C3</anon></data>
1276 <data><anon>R3C1</anon><anon>R3C2</anon><anon>R3C3</anon></data>
1277 </opt>
1278
1279 {
1280 'head' => [
1281 [ 'Col 1', 'Col 2', 'Col 3' ]
1282 ],
1283 'data' => [
1284 [ 'R1C1', 'R1C2', 'R1C3' ],
1285 [ 'R2C1', 'R2C2', 'R2C3' ],
1286 [ 'R3C1', 'R3C2', 'R3C3' ]
1287 ]
1288 }
1289
1290 Anonymous arrays can be nested to arbitrary levels and as a special
1291 case, if the surrounding tags for an XML document contain only an
1292 anonymous array the arrayref will be returned directly rather than the
1293 usual hashref:
1294
1295 <opt>
1296 <anon><anon>Col 1</anon><anon>Col 2</anon></anon>
1297 <anon><anon>R1C1</anon><anon>R1C2</anon></anon>
1298 <anon><anon>R2C1</anon><anon>R2C2</anon></anon>
1299 </opt>
1300
1301 [
1302 [ 'Col 1', 'Col 2' ],
1303 [ 'R1C1', 'R1C2' ],
1304 [ 'R2C1', 'R2C2' ]
1305 ]
1306
1307 Elements which only contain text content will simply be represented as
1308 a scalar. Where an element has both attributes and text content, the
1309 element will be represented as a hashref with the text content in the
1310 'content' key (see the "ContentKey" option):
1311
1312 <opt>
1313 <one>first</one>
1314 <two attr="value">second</two>
1315 </opt>
1316
1317 {
1318 'one' => 'first',
1319 'two' => { 'attr' => 'value', 'content' => 'second' }
1320 }
1321
1322 Mixed content (elements which contain both text content and nested
1323 elements) will be not be represented in a useful way - element order
1324 and significant whitespace will be lost. If you need to work with
1325 mixed content, then XML::Simple is not the right tool for your job -
1326 check out the next section.
1327
1329 XML::Simple is able to present a simple API because it makes some
1330 assumptions on your behalf. These include:
1331
1332 • You're not interested in text content consisting only of whitespace
1333
1334 • You don't mind that when things get slurped into a hash the order
1335 is lost
1336
1337 • You don't want fine-grained control of the formatting of generated
1338 XML
1339
1340 • You would never use a hash key that was not a legal XML element
1341 name
1342
1343 • You don't need help converting between different encodings
1344
1345 In a serious XML project, you'll probably outgrow these assumptions
1346 fairly quickly. This section of the document used to offer some advice
1347 on choosing a more powerful option. That advice has now grown into the
1348 'Perl-XML FAQ' document which you can find at:
1349 <http://perl-xml.sourceforge.net/faq/>
1350
1351 The advice in the FAQ boils down to a quick explanation of tree versus
1352 event based parsers and then recommends:
1353
1354 For event based parsing, use SAX (do not set out to write any new code
1355 for XML::Parser's handler API - it is obsolete).
1356
1357 For tree-based parsing, you could choose between the 'Perlish' approach
1358 of XML::Twig and more standards based DOM implementations - preferably
1359 one with XPath support such as XML::LibXML.
1360
1362 XML::Simple requires either XML::Parser or XML::SAX.
1363
1364 To generate documents with namespaces, XML::NamespaceSupport is
1365 required.
1366
1367 The optional caching functions require Storable.
1368
1369 Answers to Frequently Asked Questions about XML::Simple are bundled
1370 with this distribution as: XML::Simple::FAQ
1371
1373 Copyright 1999-2004 Grant McLean <grantm@cpan.org>
1374
1375 This library is free software; you can redistribute it and/or modify it
1376 under the same terms as Perl itself.
1377
1378
1379
1380perl v5.34.0 2022-01-21 XML::Simple(3)