1XML::Simple(3) User Contributed Perl Documentation XML::Simple(3)
2
3
4
6 XML::Simple - Easily read/write XML (esp config files)
7
9 use XML::Simple qw(:strict);
10
11 my $ref = XMLin([<xml file or string>] [, <options>]);
12
13 my $xml = XMLout($hashref [, <options>]);
14
15 Or the object oriented way:
16
17 require XML::Simple qw(:strict);
18
19 my $xs = XML::Simple->new([<options>]);
20
21 my $ref = $xs->XMLin([<xml file or string>] [, <options>]);
22
23 my $xml = $xs->XMLout($hashref [, <options>]);
24
25 (or see "SAX SUPPORT" for 'the SAX way').
26
27 Note, in these examples, the square brackets are used to denote
28 optional items not to imply items should be supplied in arrayrefs.
29
31 The use of this module in new code is discouraged. Other modules are
32 available which provide more straightforward and consistent interfaces.
33 In particular, XML::LibXML is highly recommended.
34
35 The major problems with this module are the large number of options and
36 the arbitrary ways in which these options interact - often with
37 unexpected results.
38
39 Patches with bug fixes and documentation fixes are welcome, but new
40 features are unlikely to be added.
41
43 Say you have a script called foo and a file of configuration options
44 called foo.xml containing the following:
45
46 <config logdir="/var/log/foo/" debugfile="/tmp/foo.debug">
47 <server name="sahara" osname="solaris" osversion="2.6">
48 <address>10.0.0.101</address>
49 <address>10.0.1.101</address>
50 </server>
51 <server name="gobi" osname="irix" osversion="6.5">
52 <address>10.0.0.102</address>
53 </server>
54 <server name="kalahari" osname="linux" osversion="2.0.34">
55 <address>10.0.0.103</address>
56 <address>10.0.1.103</address>
57 </server>
58 </config>
59
60 The following lines of code in foo:
61
62 use XML::Simple qw(:strict);
63
64 my $config = XMLin(undef, KeyAttr => { server => 'name' }, ForceArray => [ 'server', 'address' ]);
65
66 will 'slurp' the configuration options into the hashref $config
67 (because no filename or XML string was passed as the first argument to
68 "XMLin()" the name and location of the XML file will be inferred from
69 name and location of the script). You can dump out the contents of the
70 hashref using Data::Dumper:
71
72 use Data::Dumper;
73
74 print Dumper($config);
75
76 which will produce something like this (formatting has been adjusted
77 for brevity):
78
79 {
80 'logdir' => '/var/log/foo/',
81 'debugfile' => '/tmp/foo.debug',
82 'server' => {
83 'sahara' => {
84 'osversion' => '2.6',
85 'osname' => 'solaris',
86 'address' => [ '10.0.0.101', '10.0.1.101' ]
87 },
88 'gobi' => {
89 'osversion' => '6.5',
90 'osname' => 'irix',
91 'address' => [ '10.0.0.102' ]
92 },
93 'kalahari' => {
94 'osversion' => '2.0.34',
95 'osname' => 'linux',
96 'address' => [ '10.0.0.103', '10.0.1.103' ]
97 }
98 }
99 }
100
101 Your script could then access the name of the log directory like this:
102
103 print $config->{logdir};
104
105 similarly, the second address on the server 'kalahari' could be
106 referenced as:
107
108 print $config->{server}->{kalahari}->{address}->[1];
109
110 Note: If the mapping between the output of Data::Dumper and the print
111 statements above is not obvious to you, then please refer to the
112 'references' tutorial (AKA: "Mark's very short tutorial about
113 references") at perlreftut.
114
115 In this example, the "ForceArray" option was used to list elements that
116 might occur multiple times and should therefore be represented as
117 arrayrefs (even when only one element is present).
118
119 The "KeyAttr" option was used to indicate that each "<server>" element
120 has a unique identifier in the "name" attribute. This allows you to
121 index directly to a particular server record using the name as a hash
122 key (as shown above).
123
124 For simple requirements, that's really all there is to it. If you want
125 to store your XML in a different directory or file, or pass it in as a
126 string or even pass it in via some derivative of an IO::Handle, you'll
127 need to check out "OPTIONS". If you want to turn off or tweak the
128 array folding feature (that neat little transformation that produced
129 $config->{server}) you'll find options for that as well.
130
131 If you want to generate XML (for example to write a modified version of
132 $config back out as XML), check out "XMLout()".
133
134 If your needs are not so simple, this may not be the module for you.
135 In that case, you might want to read "WHERE TO FROM HERE?".
136
138 The XML::Simple module provides a simple API layer on top of an
139 underlying XML parsing module (either XML::Parser or one of the SAX2
140 parser modules). Two functions are exported: "XMLin()" and "XMLout()".
141 Note: you can explicity request the lower case versions of the function
142 names: "xml_in()" and "xml_out()".
143
144 The simplest approach is to call these two functions directly, but an
145 optional object oriented interface (see "OPTIONAL OO INTERFACE" below)
146 allows them to be called as methods of an XML::Simple object. The
147 object interface can also be used at either end of a SAX pipeline.
148
149 XMLin()
150 Parses XML formatted data and returns a reference to a data structure
151 which contains the same information in a more readily accessible form.
152 (Skip down to "EXAMPLES" below, for more sample code).
153
154 "XMLin()" accepts an optional XML specifier followed by zero or more
155 'name => value' option pairs. The XML specifier can be one of the
156 following:
157
158 A filename
159 If the filename contains no directory components "XMLin()" will
160 look for the file in each directory in the SearchPath (see
161 "OPTIONS" below) or in the current directory if the SearchPath
162 option is not defined. eg:
163
164 $ref = XMLin('/etc/params.xml');
165
166 Note, the filename '-' can be used to parse from STDIN.
167
168 undef
169 If there is no XML specifier, "XMLin()" will check the script
170 directory and each of the SearchPath directories for a file with
171 the same name as the script but with the extension '.xml'. Note:
172 if you wish to specify options, you must specify the value 'undef'.
173 eg:
174
175 $ref = XMLin(undef, ForceArray => 1);
176
177 A string of XML
178 A string containing XML (recognised by the presence of '<' and '>'
179 characters) will be parsed directly. eg:
180
181 $ref = XMLin('<opt username="bob" password="flurp" />');
182
183 An IO::Handle object
184 An IO::Handle object will be read to EOF and its contents parsed.
185 eg:
186
187 $fh = IO::File->new('/etc/params.xml');
188 $ref = XMLin($fh);
189
190 XMLout()
191 Takes a data structure (generally a hashref) and returns an XML
192 encoding of that structure. If the resulting XML is parsed using
193 "XMLin()", it should return a data structure equivalent to the original
194 (see caveats below).
195
196 The "XMLout()" function can also be used to output the XML as SAX
197 events see the "Handler" option and "SAX SUPPORT" for more details).
198
199 When translating hashes to XML, hash keys which have a leading '-' will
200 be silently skipped. This is the approved method for marking elements
201 of a data structure which should be ignored by "XMLout". (Note: If
202 these items were not skipped the key names would be emitted as element
203 or attribute names with a leading '-' which would not be valid XML).
204
205 Caveats
206 Some care is required in creating data structures which will be passed
207 to "XMLout()". Hash keys from the data structure will be encoded as
208 either XML element names or attribute names. Therefore, you should use
209 hash key names which conform to the relatively strict XML naming rules:
210
211 Names in XML must begin with a letter. The remaining characters may be
212 letters, digits, hyphens (-), underscores (_) or full stops (.). It is
213 also allowable to include one colon (:) in an element name but this
214 should only be used when working with namespaces (XML::Simple can only
215 usefully work with namespaces when teamed with a SAX Parser).
216
217 You can use other punctuation characters in hash values (just not in
218 hash keys) however XML::Simple does not support dumping binary data.
219
220 If you break these rules, the current implementation of "XMLout()" will
221 simply emit non-compliant XML which will be rejected if you try to read
222 it back in. (A later version of XML::Simple might take a more
223 proactive approach).
224
225 Note also that although you can nest hashes and arrays to arbitrary
226 levels, circular data structures are not supported and will cause
227 "XMLout()" to die.
228
229 If you wish to 'round-trip' arbitrary data structures from Perl to XML
230 and back to Perl, then you should probably disable array folding (using
231 the KeyAttr option) both with "XMLout()" and with "XMLin()". If you
232 still don't get the expected results, you may prefer to use XML::Dumper
233 which is designed for exactly that purpose.
234
235 Refer to "WHERE TO FROM HERE?" if "XMLout()" is too simple for your
236 needs.
237
239 XML::Simple supports a number of options (in fact as each release of
240 XML::Simple adds more options, the module's claim to the name 'Simple'
241 becomes increasingly tenuous). If you find yourself repeatedly having
242 to specify the same options, you might like to investigate "OPTIONAL OO
243 INTERFACE" below.
244
245 If you can't be bothered reading the documentation, refer to "STRICT
246 MODE" to automatically catch common mistakes.
247
248 Because there are so many options, it's hard for new users to know
249 which ones are important, so here are the two you really need to know
250 about:
251
252 · check out "ForceArray" because you'll almost certainly want to turn
253 it on
254
255 · make sure you know what the "KeyAttr" option does and what its
256 default value is because it may surprise you otherwise (note in
257 particular that 'KeyAttr' affects both "XMLin" and "XMLout")
258
259 The option name headings below have a trailing 'comment' - a hash
260 followed by two pieces of metadata:
261
262 · Options are marked with 'in' if they are recognised by "XMLin()"
263 and 'out' if they are recognised by "XMLout()".
264
265 · Each option is also flagged to indicate whether it is:
266
267 'important' - don't use the module until you understand this one
268 'handy' - you can skip this on the first time through
269 'advanced' - you can skip this on the second time through
270 'SAX only' - don't worry about this unless you're using SAX (or
271 alternatively if you need this, you also need SAX)
272 'seldom used' - you'll probably never use this unless you were the
273 person that requested the feature
274
275 The options are listed alphabetically:
276
277 Note: option names are no longer case sensitive so you can use the
278 mixed case versions shown here; all lower case as required by versions
279 2.03 and earlier; or you can add underscores between the words (eg:
280 key_attr).
281
282 AttrIndent => 1 # out - handy
283 When you are using "XMLout()", enable this option to have attributes
284 printed one-per-line with sensible indentation rather than all on one
285 line.
286
287 Cache => [ cache schemes ] # in - advanced
288 Because loading the XML::Parser module and parsing an XML file can
289 consume a significant number of CPU cycles, it is often desirable to
290 cache the output of "XMLin()" for later reuse.
291
292 When parsing from a named file, XML::Simple supports a number of
293 caching schemes. The 'Cache' option may be used to specify one or more
294 schemes (using an anonymous array). Each scheme will be tried in turn
295 in the hope of finding a cached pre-parsed representation of the XML
296 file. If no cached copy is found, the file will be parsed and the
297 first cache scheme in the list will be used to save a copy of the
298 results. The following cache schemes have been implemented:
299
300 storable
301 Utilises Storable.pm to read/write a cache file with the same name
302 as the XML file but with the extension .stor
303
304 memshare
305 When a file is first parsed, a copy of the resulting data structure
306 is retained in memory in the XML::Simple module's namespace.
307 Subsequent calls to parse the same file will return a reference to
308 this structure. This cached version will persist only for the life
309 of the Perl interpreter (which in the case of mod_perl for example,
310 may be some significant time).
311
312 Because each caller receives a reference to the same data
313 structure, a change made by one caller will be visible to all. For
314 this reason, the reference returned should be treated as read-only.
315
316 memcopy
317 This scheme works identically to 'memshare' (above) except that
318 each caller receives a reference to a new data structure which is a
319 copy of the cached version. Copying the data structure will add a
320 little processing overhead, therefore this scheme should only be
321 used where the caller intends to modify the data structure (or
322 wishes to protect itself from others who might). This scheme uses
323 Storable.pm to perform the copy.
324
325 Warning! The memory-based caching schemes compare the timestamp on the
326 file to the time when it was last parsed. If the file is stored on an
327 NFS filesystem (or other network share) and the clock on the file
328 server is not exactly synchronised with the clock where your script is
329 run, updates to the source XML file may appear to be ignored.
330
331 ContentKey => 'keyname' # in+out - seldom used
332 When text content is parsed to a hash value, this option let's you
333 specify a name for the hash key to override the default 'content'. So
334 for example:
335
336 XMLin('<opt one="1">Text</opt>', ContentKey => 'text')
337
338 will parse to:
339
340 { 'one' => 1, 'text' => 'Text' }
341
342 instead of:
343
344 { 'one' => 1, 'content' => 'Text' }
345
346 "XMLout()" will also honour the value of this option when converting a
347 hashref to XML.
348
349 You can also prefix your selected key name with a '-' character to have
350 "XMLin()" try a little harder to eliminate unnecessary 'content' keys
351 after array folding. For example:
352
353 XMLin(
354 '<opt><item name="one">First</item><item name="two">Second</item></opt>',
355 KeyAttr => {item => 'name'},
356 ForceArray => [ 'item' ],
357 ContentKey => '-content'
358 )
359
360 will parse to:
361
362 {
363 'item' => {
364 'one' => 'First'
365 'two' => 'Second'
366 }
367 }
368
369 rather than this (without the '-'):
370
371 {
372 'item' => {
373 'one' => { 'content' => 'First' }
374 'two' => { 'content' => 'Second' }
375 }
376 }
377
378 DataHandler => code_ref # in - SAX only
379 When you use an XML::Simple object as a SAX handler, it will return a
380 'simple tree' data structure in the same format as "XMLin()" would
381 return. If this option is set (to a subroutine reference), then when
382 the tree is built the subroutine will be called and passed two
383 arguments: a reference to the XML::Simple object and a reference to the
384 data tree. The return value from the subroutine will be returned to
385 the SAX driver. (See "SAX SUPPORT" for more details).
386
387 ForceArray => 1 # in - important
388 This option should be set to '1' to force nested elements to be
389 represented as arrays even when there is only one. Eg, with ForceArray
390 enabled, this XML:
391
392 <opt>
393 <name>value</name>
394 </opt>
395
396 would parse to this:
397
398 {
399 'name' => [
400 'value'
401 ]
402 }
403
404 instead of this (the default):
405
406 {
407 'name' => 'value'
408 }
409
410 This option is especially useful if the data structure is likely to be
411 written back out as XML and the default behaviour of rolling single
412 nested elements up into attributes is not desirable.
413
414 If you are using the array folding feature, you should almost certainly
415 enable this option. If you do not, single nested elements will not be
416 parsed to arrays and therefore will not be candidates for folding to a
417 hash. (Given that the default value of 'KeyAttr' enables array
418 folding, the default value of this option should probably also have
419 been enabled too - sorry).
420
421 ForceArray => [ names ] # in - important
422 This alternative (and preferred) form of the 'ForceArray' option allows
423 you to specify a list of element names which should always be forced
424 into an array representation, rather than the 'all or nothing' approach
425 above.
426
427 It is also possible (since version 2.05) to include compiled regular
428 expressions in the list - any element names which match the pattern
429 will be forced to arrays. If the list contains only a single regex,
430 then it is not necessary to enclose it in an arrayref. Eg:
431
432 ForceArray => qr/_list$/
433
434 ForceContent => 1 # in - seldom used
435 When "XMLin()" parses elements which have text content as well as
436 attributes, the text content must be represented as a hash value rather
437 than a simple scalar. This option allows you to force text content to
438 always parse to a hash value even when there are no attributes. So for
439 example:
440
441 XMLin('<opt><x>text1</x><y a="2">text2</y></opt>', ForceContent => 1)
442
443 will parse to:
444
445 {
446 'x' => { 'content' => 'text1' },
447 'y' => { 'a' => 2, 'content' => 'text2' }
448 }
449
450 instead of:
451
452 {
453 'x' => 'text1',
454 'y' => { 'a' => 2, 'content' => 'text2' }
455 }
456
457 GroupTags => { grouping tag => grouped tag } # in+out - handy
458 You can use this option to eliminate extra levels of indirection in
459 your Perl data structure. For example this XML:
460
461 <opt>
462 <searchpath>
463 <dir>/usr/bin</dir>
464 <dir>/usr/local/bin</dir>
465 <dir>/usr/X11/bin</dir>
466 </searchpath>
467 </opt>
468
469 Would normally be read into a structure like this:
470
471 {
472 searchpath => {
473 dir => [ '/usr/bin', '/usr/local/bin', '/usr/X11/bin' ]
474 }
475 }
476
477 But when read in with the appropriate value for 'GroupTags':
478
479 my $opt = XMLin($xml, GroupTags => { searchpath => 'dir' });
480
481 It will return this simpler structure:
482
483 {
484 searchpath => [ '/usr/bin', '/usr/local/bin', '/usr/X11/bin' ]
485 }
486
487 The grouping element ("<searchpath>" in the example) must not contain
488 any attributes or elements other than the grouped element.
489
490 You can specify multiple 'grouping element' to 'grouped element'
491 mappings in the same hashref. If this option is combined with
492 "KeyAttr", the array folding will occur first and then the grouped
493 element names will be eliminated.
494
495 "XMLout" will also use the grouptag mappings to re-introduce the tags
496 around the grouped elements. Beware though that this will occur in all
497 places that the 'grouping tag' name occurs - you probably don't want to
498 use the same name for elements as well as attributes.
499
500 Handler => object_ref # out - SAX only
501 Use the 'Handler' option to have "XMLout()" generate SAX events rather
502 than returning a string of XML. For more details see "SAX SUPPORT"
503 below.
504
505 Note: the current implementation of this option generates a string of
506 XML and uses a SAX parser to translate it into SAX events. The normal
507 encoding rules apply here - your data must be UTF8 encoded unless you
508 specify an alternative encoding via the 'XMLDecl' option; and by the
509 time the data reaches the handler object, it will be in UTF8 form
510 regardless of the encoding you supply. A future implementation of this
511 option may generate the events directly.
512
513 KeepRoot => 1 # in+out - handy
514 In its attempt to return a data structure free of superfluous detail
515 and unnecessary levels of indirection, "XMLin()" normally discards the
516 root element name. Setting the 'KeepRoot' option to '1' will cause the
517 root element name to be retained. So after executing this code:
518
519 $config = XMLin('<config tempdir="/tmp" />', KeepRoot => 1)
520
521 You'll be able to reference the tempdir as
522 "$config->{config}->{tempdir}" instead of the default
523 "$config->{tempdir}".
524
525 Similarly, setting the 'KeepRoot' option to '1' will tell "XMLout()"
526 that the data structure already contains a root element name and it is
527 not necessary to add another.
528
529 KeyAttr => [ list ] # in+out - important
530 This option controls the 'array folding' feature which translates
531 nested elements from an array to a hash. It also controls the
532 'unfolding' of hashes to arrays.
533
534 For example, this XML:
535
536 <opt>
537 <user login="grep" fullname="Gary R Epstein" />
538 <user login="stty" fullname="Simon T Tyson" />
539 </opt>
540
541 would, by default, parse to this:
542
543 {
544 'user' => [
545 {
546 'login' => 'grep',
547 'fullname' => 'Gary R Epstein'
548 },
549 {
550 'login' => 'stty',
551 'fullname' => 'Simon T Tyson'
552 }
553 ]
554 }
555
556 If the option 'KeyAttr => "login"' were used to specify that the
557 'login' attribute is a key, the same XML would parse to:
558
559 {
560 'user' => {
561 'stty' => {
562 'fullname' => 'Simon T Tyson'
563 },
564 'grep' => {
565 'fullname' => 'Gary R Epstein'
566 }
567 }
568 }
569
570 The key attribute names should be supplied in an arrayref if there is
571 more than one. "XMLin()" will attempt to match attribute names in the
572 order supplied. "XMLout()" will use the first attribute name supplied
573 when 'unfolding' a hash into an array.
574
575 Note 1: The default value for 'KeyAttr' is ['name', 'key', 'id']. If
576 you do not want folding on input or unfolding on output you must set
577 this option to an empty list to disable the feature.
578
579 Note 2: If you wish to use this option, you should also enable the
580 "ForceArray" option. Without 'ForceArray', a single nested element
581 will be rolled up into a scalar rather than an array and therefore will
582 not be folded (since only arrays get folded).
583
584 KeyAttr => { list } # in+out - important
585 This alternative (and preferred) method of specifiying the key
586 attributes allows more fine grained control over which elements are
587 folded and on which attributes. For example the option 'KeyAttr => {
588 package => 'id' } will cause any package elements to be folded on the
589 'id' attribute. No other elements which have an 'id' attribute will be
590 folded at all.
591
592 Note: "XMLin()" will generate a warning (or a fatal error in "STRICT
593 MODE") if this syntax is used and an element which does not have the
594 specified key attribute is encountered (eg: a 'package' element without
595 an 'id' attribute, to use the example above). Warnings will only be
596 generated if -w is in force.
597
598 Two further variations are made possible by prefixing a '+' or a '-'
599 character to the attribute name:
600
601 The option 'KeyAttr => { user => "+login" }' will cause this XML:
602
603 <opt>
604 <user login="grep" fullname="Gary R Epstein" />
605 <user login="stty" fullname="Simon T Tyson" />
606 </opt>
607
608 to parse to this data structure:
609
610 {
611 'user' => {
612 'stty' => {
613 'fullname' => 'Simon T Tyson',
614 'login' => 'stty'
615 },
616 'grep' => {
617 'fullname' => 'Gary R Epstein',
618 'login' => 'grep'
619 }
620 }
621 }
622
623 The '+' indicates that the value of the key attribute should be copied
624 rather than moved to the folded hash key.
625
626 A '-' prefix would produce this result:
627
628 {
629 'user' => {
630 'stty' => {
631 'fullname' => 'Simon T Tyson',
632 '-login' => 'stty'
633 },
634 'grep' => {
635 'fullname' => 'Gary R Epstein',
636 '-login' => 'grep'
637 }
638 }
639 }
640
641 As described earlier, "XMLout" will ignore hash keys starting with a
642 '-'.
643
644 NoAttr => 1 # in+out - handy
645 When used with "XMLout()", the generated XML will contain no
646 attributes. All hash key/values will be represented as nested elements
647 instead.
648
649 When used with "XMLin()", any attributes in the XML will be ignored.
650
651 NoEscape => 1 # out - seldom used
652 By default, "XMLout()" will translate the characters '<', '>', '&' and
653 '"' to '<', '>', '&' and '"' respectively. Use this
654 option to suppress escaping (presumably because you've already escaped
655 the data in some more sophisticated manner).
656
657 NoIndent => 1 # out - seldom used
658 Set this option to 1 to disable "XMLout()"'s default 'pretty printing'
659 mode. With this option enabled, the XML output will all be on one line
660 (unless there are newlines in the data) - this may be easier for
661 downstream processing.
662
663 NoSort => 1 # out - seldom used
664 Newer versions of XML::Simple sort elements and attributes
665 alphabetically (*), by default. Enable this option to suppress the
666 sorting - possibly for backwards compatibility.
667
668 * Actually, sorting is alphabetical but 'key' attribute or element
669 names (as in 'KeyAttr') sort first. Also, when a hash of hashes is
670 'unfolded', the elements are sorted alphabetically by the value of the
671 key field.
672
673 NormaliseSpace => 0 | 1 | 2 # in - handy
674 This option controls how whitespace in text content is handled.
675 Recognised values for the option are:
676
677 · 0 = (default) whitespace is passed through unaltered (except of
678 course for the normalisation of whitespace in attribute values
679 which is mandated by the XML recommendation)
680
681 · 1 = whitespace is normalised in any value used as a hash key
682 (normalising means removing leading and trailing whitespace and
683 collapsing sequences of whitespace characters to a single space)
684
685 · 2 = whitespace is normalised in all text content
686
687 Note: you can spell this option with a 'z' if that is more natural for
688 you.
689
690 NSExpand => 1 # in+out handy - SAX only
691 This option controls namespace expansion - the translation of element
692 and attribute names of the form 'prefix:name' to '{uri}name'. For
693 example the element name 'xsl:template' might be expanded to:
694 '{http://www.w3.org/1999/XSL/Transform}template'.
695
696 By default, "XMLin()" will return element names and attribute names
697 exactly as they appear in the XML. Setting this option to 1 will cause
698 all element and attribute names to be expanded to include their
699 namespace prefix.
700
701 Note: You must be using a SAX parser for this option to work (ie: it
702 does not work with XML::Parser).
703
704 This option also controls whether "XMLout()" performs the reverse
705 translation from '{uri}name' back to 'prefix:name'. The default is no
706 translation. If your data contains expanded names, you should set this
707 option to 1 otherwise "XMLout" will emit XML which is not well formed.
708
709 Note: You must have the XML::NamespaceSupport module installed if you
710 want "XMLout()" to translate URIs back to prefixes.
711
712 NumericEscape => 0 | 1 | 2 # out - handy
713 Use this option to have 'high' (non-ASCII) characters in your Perl data
714 structure converted to numeric entities (eg: €) in the XML
715 output. Three levels are possible:
716
717 0 - default: no numeric escaping (OK if you're writing out UTF8)
718
719 1 - only characters above 0xFF are escaped (ie: characters in the
720 0x80-FF range are not escaped), possibly useful with ISO8859-1 output
721
722 2 - all characters above 0x7F are escaped (good for plain ASCII output)
723
724 OutputFile => <file specifier> # out - handy
725 The default behaviour of "XMLout()" is to return the XML as a string.
726 If you wish to write the XML to a file, simply supply the filename
727 using the 'OutputFile' option.
728
729 This option also accepts an IO handle object - especially useful in
730 Perl 5.8.0 and later for output using an encoding other than UTF-8, eg:
731
732 open my $fh, '>:encoding(iso-8859-1)', $path or die "open($path): $!";
733 XMLout($ref, OutputFile => $fh);
734
735 Note, XML::Simple does not require that the object you pass in to the
736 OutputFile option inherits from IO::Handle - it simply assumes the
737 object supports a "print" method.
738
739 ParserOpts => [ XML::Parser Options ] # in - don't use this
740 Note: This option is now officially deprecated. If you find it useful,
741 email the author with an example of what you use it for. Do not use
742 this option to set the ProtocolEncoding, that's just plain wrong - fix
743 the XML.
744
745 This option allows you to pass parameters to the constructor of the
746 underlying XML::Parser object (which of course assumes you're not using
747 SAX).
748
749 RootName => 'string' # out - handy
750 By default, when "XMLout()" generates XML, the root element will be
751 named 'opt'. This option allows you to specify an alternative name.
752
753 Specifying either undef or the empty string for the RootName option
754 will produce XML with no root elements. In most cases the resulting
755 XML fragment will not be 'well formed' and therefore could not be read
756 back in by "XMLin()". Nevertheless, the option has been found to be
757 useful in certain circumstances.
758
759 SearchPath => [ list ] # in - handy
760 If you pass "XMLin()" a filename, but the filename include no directory
761 component, you can use this option to specify which directories should
762 be searched to locate the file. You might use this option to search
763 first in the user's home directory, then in a global directory such as
764 /etc.
765
766 If a filename is provided to "XMLin()" but SearchPath is not defined,
767 the file is assumed to be in the current directory.
768
769 If the first parameter to "XMLin()" is undefined, the default
770 SearchPath will contain only the directory in which the script itself
771 is located. Otherwise the default SearchPath will be empty.
772
773 StrictMode => 1 | 0 # in+out seldom used
774 This option allows you to turn "STRICT MODE" on or off for a particular
775 call, regardless of whether it was enabled at the time XML::Simple was
776 loaded.
777
778 SuppressEmpty => 1 | '' | undef # in+out - handy
779 This option controls what "XMLin()" should do with empty elements (no
780 attributes and no content). The default behaviour is to represent them
781 as empty hashes. Setting this option to a true value (eg: 1) will
782 cause empty elements to be skipped altogether. Setting the option to
783 'undef' or the empty string will cause empty elements to be represented
784 as the undefined value or the empty string respectively. The latter
785 two alternatives are a little easier to test for in your code than a
786 hash with no keys.
787
788 The option also controls what "XMLout()" does with undefined values.
789 Setting the option to undef causes undefined values to be output as
790 empty elements (rather than empty attributes), it also suppresses the
791 generation of warnings about undefined values. Setting the option to a
792 true value (eg: 1) causes undefined values to be skipped altogether on
793 output.
794
795 ValueAttr => [ names ] # in - handy
796 Use this option to deal elements which always have a single attribute
797 and no content. Eg:
798
799 <opt>
800 <colour value="red" />
801 <size value="XXL" />
802 </opt>
803
804 Setting "ValueAttr => [ 'value' ]" will cause the above XML to parse
805 to:
806
807 {
808 colour => 'red',
809 size => 'XXL'
810 }
811
812 instead of this (the default):
813
814 {
815 colour => { value => 'red' },
816 size => { value => 'XXL' }
817 }
818
819 Note: This form of the ValueAttr option is not compatible with
820 "XMLout()" - since the attribute name is discarded at parse time, the
821 original XML cannot be reconstructed.
822
823 ValueAttr => { element => attribute, ... } # in+out - handy
824 This (preferred) form of the ValueAttr option requires you to specify
825 both the element and the attribute names. This is not only safer, it
826 also allows the original XML to be reconstructed by "XMLout()".
827
828 Note: You probably don't want to use this option and the NoAttr option
829 at the same time.
830
831 Variables => { name => value } # in - handy
832 This option allows variables in the XML to be expanded when the file is
833 read. (there is no facility for putting the variable names back if you
834 regenerate XML using "XMLout").
835
836 A 'variable' is any text of the form "${name}" which occurs in an
837 attribute value or in the text content of an element. If 'name'
838 matches a key in the supplied hashref, "${name}" will be replaced with
839 the corresponding value from the hashref. If no matching key is found,
840 the variable will not be replaced. Names must match the regex:
841 "[\w.]+" (ie: only 'word' characters and dots are allowed).
842
843 VarAttr => 'attr_name' # in - handy
844 In addition to the variables defined using "Variables", this option
845 allows variables to be defined in the XML. A variable definition
846 consists of an element with an attribute called 'attr_name' (the value
847 of the "VarAttr" option). The value of the attribute will be used as
848 the variable name and the text content of the element will be used as
849 the value. A variable defined in this way will override a variable
850 defined using the "Variables" option. For example:
851
852 XMLin( '<opt>
853 <dir name="prefix">/usr/local/apache</dir>
854 <dir name="exec_prefix">${prefix}</dir>
855 <dir name="bindir">${exec_prefix}/bin</dir>
856 </opt>',
857 VarAttr => 'name', ContentKey => '-content'
858 );
859
860 produces the following data structure:
861
862 {
863 dir => {
864 prefix => '/usr/local/apache',
865 exec_prefix => '/usr/local/apache',
866 bindir => '/usr/local/apache/bin',
867 }
868 }
869
870 XMLDecl => 1 or XMLDecl => 'string' # out - handy
871 If you want the output from "XMLout()" to start with the optional XML
872 declaration, simply set the option to '1'. The default XML declaration
873 is:
874
875 <?xml version='1.0' standalone='yes'?>
876
877 If you want some other string (for example to declare an encoding
878 value), set the value of this option to the complete string you
879 require.
880
882 The procedural interface is both simple and convenient however there
883 are a couple of reasons why you might prefer to use the object oriented
884 (OO) interface:
885
886 · to define a set of default values which should be used on all
887 subsequent calls to "XMLin()" or "XMLout()"
888
889 · to override methods in XML::Simple to provide customised behaviour
890
891 The default values for the options described above are unlikely to suit
892 everyone. The OO interface allows you to effectively override
893 XML::Simple's defaults with your preferred values. It works like this:
894
895 First create an XML::Simple parser object with your preferred defaults:
896
897 my $xs = XML::Simple->new(ForceArray => 1, KeepRoot => 1);
898
899 then call "XMLin()" or "XMLout()" as a method of that object:
900
901 my $ref = $xs->XMLin($xml);
902 my $xml = $xs->XMLout($ref);
903
904 You can also specify options when you make the method calls and these
905 values will be merged with the values specified when the object was
906 created. Values specified in a method call take precedence.
907
908 Note: when called as methods, the "XMLin()" and "XMLout()" routines may
909 be called as "xml_in()" or "xml_out()". The method names are aliased
910 so the only difference is the aesthetics.
911
912 Parsing Methods
913 You can explicitly call one of the following methods rather than rely
914 on the "xml_in()" method automatically determining whether the target
915 to be parsed is a string, a file or a filehandle:
916
917 parse_string(text)
918 Works exactly like the "xml_in()" method but assumes the first
919 argument is a string of XML (or a reference to a scalar containing
920 a string of XML).
921
922 parse_file(filename)
923 Works exactly like the "xml_in()" method but assumes the first
924 argument is the name of a file containing XML.
925
926 parse_fh(file_handle)
927 Works exactly like the "xml_in()" method but assumes the first
928 argument is a filehandle which can be read to get XML.
929
930 Hook Methods
931 You can make your own class which inherits from XML::Simple and
932 overrides certain behaviours. The following methods may provide useful
933 'hooks' upon which to hang your modified behaviour. You may find other
934 undocumented methods by examining the source, but those may be subject
935 to change in future releases.
936
937 handle_options(direction, name => value ...)
938 This method will be called when one of the parsing methods or the
939 "XMLout()" method is called. The initial argument will be a string
940 (either 'in' or 'out') and the remaining arguments will be name
941 value pairs.
942
943 default_config_file()
944 Calculates and returns the name of the file which should be parsed
945 if no filename is passed to "XMLin()" (default: "$0.xml").
946
947 build_simple_tree(filename, string)
948 Called from "XMLin()" or any of the parsing methods. Takes either
949 a file name as the first argument or "undef" followed by a 'string'
950 as the second argument. Returns a simple tree data structure. You
951 could override this method to apply your own transformations before
952 the data structure is returned to the caller.
953
954 new_hashref()
955 When the 'simple tree' data structure is being built, this method
956 will be called to create any required anonymous hashrefs.
957
958 sorted_keys(name, hashref)
959 Called when "XMLout()" is translating a hashref to XML. This
960 routine returns a list of hash keys in the order that the
961 corresponding attributes/elements should appear in the output.
962
963 escape_value(string)
964 Called from "XMLout()", takes a string and returns a copy of the
965 string with XML character escaping rules applied.
966
967 numeric_escape(string)
968 Called from "escape_value()", to handle non-ASCII characters
969 (depending on the value of the NumericEscape option).
970
971 copy_hash(hashref, extra_key => value, ...)
972 Called from "XMLout()", when 'unfolding' a hash of hashes into an
973 array of hashes. You might wish to override this method if you're
974 using tied hashes and don't want them to get untied.
975
976 Cache Methods
977 XML::Simple implements three caching schemes ('storable', 'memshare'
978 and 'memcopy'). You can implement a custom caching scheme by
979 implementing two methods - one for reading from the cache and one for
980 writing to it.
981
982 For example, you might implement a new 'dbm' scheme that stores cached
983 data structures using the MLDBM module. First, you would add a
984 "cache_read_dbm()" method which accepted a filename for use as a lookup
985 key and returned a data structure on success, or undef on failure.
986 Then, you would implement a "cache_read_dbm()" method which accepted a
987 data structure and a filename.
988
989 You would use this caching scheme by specifying the option:
990
991 Cache => [ 'dbm' ]
992
994 If you import the XML::Simple routines like this:
995
996 use XML::Simple qw(:strict);
997
998 the following common mistakes will be detected and treated as fatal
999 errors
1000
1001 · Failing to explicitly set the "KeyAttr" option - if you can't be
1002 bothered reading about this option, turn it off with: KeyAttr => [
1003 ]
1004
1005 · Failing to explicitly set the "ForceArray" option - if you can't be
1006 bothered reading about this option, set it to the safest mode with:
1007 ForceArray => 1
1008
1009 · Setting ForceArray to an array, but failing to list all the
1010 elements from the KeyAttr hash.
1011
1012 · Data error - KeyAttr is set to say { part => 'partnum' } but the
1013 XML contains one or more <part> elements without a 'partnum'
1014 attribute (or nested element). Note: if strict mode is not set but
1015 -w is, this condition triggers a warning.
1016
1017 · Data error - as above, but non-unique values are present in the key
1018 attribute (eg: more than one <part> element with the same partnum).
1019 This will also trigger a warning if strict mode is not enabled.
1020
1021 · Data error - as above, but value of key attribute (eg: partnum) is
1022 not a scalar string (due to nested elements etc). This will also
1023 trigger a warning if strict mode is not enabled.
1024
1026 From version 1.08_01, XML::Simple includes support for SAX (the Simple
1027 API for XML) - specifically SAX2.
1028
1029 In a typical SAX application, an XML parser (or SAX 'driver') module
1030 generates SAX events (start of element, character data, end of element,
1031 etc) as it parses an XML document and a 'handler' module processes the
1032 events to extract the required data. This simple model allows for some
1033 interesting and powerful possibilities:
1034
1035 · Applications written to the SAX API can extract data from huge XML
1036 documents without the memory overheads of a DOM or tree API.
1037
1038 · The SAX API allows for plug and play interchange of parser modules
1039 without having to change your code to fit a new module's API. A
1040 number of SAX parsers are available with capabilities ranging from
1041 extreme portability to blazing performance.
1042
1043 · A SAX 'filter' module can implement both a handler interface for
1044 receiving data and a generator interface for passing modified data
1045 on to a downstream handler. Filters can be chained together in
1046 'pipelines'.
1047
1048 · One filter module might split a data stream to direct data to two
1049 or more downstream handlers.
1050
1051 · Generating SAX events is not the exclusive preserve of XML parsing
1052 modules. For example, a module might extract data from a
1053 relational database using DBI and pass it on to a SAX pipeline for
1054 filtering and formatting.
1055
1056 XML::Simple can operate at either end of a SAX pipeline. For example,
1057 you can take a data structure in the form of a hashref and pass it into
1058 a SAX pipeline using the 'Handler' option on "XMLout()":
1059
1060 use XML::Simple;
1061 use Some::SAX::Filter;
1062 use XML::SAX::Writer;
1063
1064 my $ref = {
1065 .... # your data here
1066 };
1067
1068 my $writer = XML::SAX::Writer->new();
1069 my $filter = Some::SAX::Filter->new(Handler => $writer);
1070 my $simple = XML::Simple->new(Handler => $filter);
1071 $simple->XMLout($ref);
1072
1073 You can also put XML::Simple at the opposite end of the pipeline to
1074 take advantage of the simple 'tree' data structure once the relevant
1075 data has been isolated through filtering:
1076
1077 use XML::SAX;
1078 use Some::SAX::Filter;
1079 use XML::Simple;
1080
1081 my $simple = XML::Simple->new(ForceArray => 1, KeyAttr => ['partnum']);
1082 my $filter = Some::SAX::Filter->new(Handler => $simple);
1083 my $parser = XML::SAX::ParserFactory->parser(Handler => $filter);
1084
1085 my $ref = $parser->parse_uri('some_huge_file.xml');
1086
1087 print $ref->{part}->{'555-1234'};
1088
1089 You can build a filter by using an XML::Simple object as a handler and
1090 setting its DataHandler option to point to a routine which takes the
1091 resulting tree, modifies it and sends it off as SAX events to a
1092 downstream handler:
1093
1094 my $writer = XML::SAX::Writer->new();
1095 my $filter = XML::Simple->new(
1096 DataHandler => sub {
1097 my $simple = shift;
1098 my $data = shift;
1099
1100 # Modify $data here
1101
1102 $simple->XMLout($data, Handler => $writer);
1103 }
1104 );
1105 my $parser = XML::SAX::ParserFactory->parser(Handler => $filter);
1106
1107 $parser->parse_uri($filename);
1108
1109 Note: In this last example, the 'Handler' option was specified in the
1110 call to "XMLout()" but it could also have been specified in the
1111 constructor.
1112
1114 If you don't care which parser module XML::Simple uses then skip this
1115 section entirely (it looks more complicated than it really is).
1116
1117 XML::Simple will default to using a SAX parser if one is available or
1118 XML::Parser if SAX is not available.
1119
1120 You can dictate which parser module is used by setting either the
1121 environment variable 'XML_SIMPLE_PREFERRED_PARSER' or the package
1122 variable $XML::Simple::PREFERRED_PARSER to contain the module name.
1123 The following rules are used:
1124
1125 · The package variable takes precedence over the environment variable
1126 if both are defined. To force XML::Simple to ignore the
1127 environment settings and use its default rules, you can set the
1128 package variable to an empty string.
1129
1130 · If the 'preferred parser' is set to the string 'XML::Parser', then
1131 XML::Parser will be used (or "XMLin()" will die if XML::Parser is
1132 not installed).
1133
1134 · If the 'preferred parser' is set to some other value, then it is
1135 assumed to be the name of a SAX parser module and is passed to
1136 XML::SAX::ParserFactory. If XML::SAX is not installed, or the
1137 requested parser module is not installed, then "XMLin()" will die.
1138
1139 · If the 'preferred parser' is not defined at all (the normal default
1140 state), an attempt will be made to load XML::SAX. If XML::SAX is
1141 installed, then a parser module will be selected according to
1142 XML::SAX::ParserFactory's normal rules (which typically means the
1143 last SAX parser installed).
1144
1145 · if the 'preferred parser' is not defined and XML::SAX is not
1146 installed, then XML::Parser will be used. "XMLin()" will die if
1147 XML::Parser is not installed.
1148
1149 Note: The XML::SAX distribution includes an XML parser written entirely
1150 in Perl. It is very portable but it is not very fast. You should
1151 consider installing XML::LibXML or XML::SAX::Expat if they are
1152 available for your platform.
1153
1155 The XML standard is very clear on the issue of non-compliant documents.
1156 An error in parsing any single element (for example a missing end tag)
1157 must cause the whole document to be rejected. XML::Simple will die
1158 with an appropriate message if it encounters a parsing error.
1159
1160 If dying is not appropriate for your application, you should arrange to
1161 call "XMLin()" in an eval block and look for errors in $@. eg:
1162
1163 my $config = eval { XMLin() };
1164 PopUpMessage($@) if($@);
1165
1166 Note, there is a common misconception that use of eval will
1167 significantly slow down a script. While that may be true when the code
1168 being eval'd is in a string, it is not true of code like the sample
1169 above.
1170
1172 When "XMLin()" reads the following very simple piece of XML:
1173
1174 <opt username="testuser" password="frodo"></opt>
1175
1176 it returns the following data structure:
1177
1178 {
1179 'username' => 'testuser',
1180 'password' => 'frodo'
1181 }
1182
1183 The identical result could have been produced with this alternative
1184 XML:
1185
1186 <opt username="testuser" password="frodo" />
1187
1188 Or this (although see 'ForceArray' option for variations):
1189
1190 <opt>
1191 <username>testuser</username>
1192 <password>frodo</password>
1193 </opt>
1194
1195 Repeated nested elements are represented as anonymous arrays:
1196
1197 <opt>
1198 <person firstname="Joe" lastname="Smith">
1199 <email>joe@smith.com</email>
1200 <email>jsmith@yahoo.com</email>
1201 </person>
1202 <person firstname="Bob" lastname="Smith">
1203 <email>bob@smith.com</email>
1204 </person>
1205 </opt>
1206
1207 {
1208 'person' => [
1209 {
1210 'email' => [
1211 'joe@smith.com',
1212 'jsmith@yahoo.com'
1213 ],
1214 'firstname' => 'Joe',
1215 'lastname' => 'Smith'
1216 },
1217 {
1218 'email' => 'bob@smith.com',
1219 'firstname' => 'Bob',
1220 'lastname' => 'Smith'
1221 }
1222 ]
1223 }
1224
1225 Nested elements with a recognised key attribute are transformed
1226 (folded) from an array into a hash keyed on the value of that attribute
1227 (see the "KeyAttr" option):
1228
1229 <opt>
1230 <person key="jsmith" firstname="Joe" lastname="Smith" />
1231 <person key="tsmith" firstname="Tom" lastname="Smith" />
1232 <person key="jbloggs" firstname="Joe" lastname="Bloggs" />
1233 </opt>
1234
1235 {
1236 'person' => {
1237 'jbloggs' => {
1238 'firstname' => 'Joe',
1239 'lastname' => 'Bloggs'
1240 },
1241 'tsmith' => {
1242 'firstname' => 'Tom',
1243 'lastname' => 'Smith'
1244 },
1245 'jsmith' => {
1246 'firstname' => 'Joe',
1247 'lastname' => 'Smith'
1248 }
1249 }
1250 }
1251
1252 The <anon> tag can be used to form anonymous arrays:
1253
1254 <opt>
1255 <head><anon>Col 1</anon><anon>Col 2</anon><anon>Col 3</anon></head>
1256 <data><anon>R1C1</anon><anon>R1C2</anon><anon>R1C3</anon></data>
1257 <data><anon>R2C1</anon><anon>R2C2</anon><anon>R2C3</anon></data>
1258 <data><anon>R3C1</anon><anon>R3C2</anon><anon>R3C3</anon></data>
1259 </opt>
1260
1261 {
1262 'head' => [
1263 [ 'Col 1', 'Col 2', 'Col 3' ]
1264 ],
1265 'data' => [
1266 [ 'R1C1', 'R1C2', 'R1C3' ],
1267 [ 'R2C1', 'R2C2', 'R2C3' ],
1268 [ 'R3C1', 'R3C2', 'R3C3' ]
1269 ]
1270 }
1271
1272 Anonymous arrays can be nested to arbirtrary levels and as a special
1273 case, if the surrounding tags for an XML document contain only an
1274 anonymous array the arrayref will be returned directly rather than the
1275 usual hashref:
1276
1277 <opt>
1278 <anon><anon>Col 1</anon><anon>Col 2</anon></anon>
1279 <anon><anon>R1C1</anon><anon>R1C2</anon></anon>
1280 <anon><anon>R2C1</anon><anon>R2C2</anon></anon>
1281 </opt>
1282
1283 [
1284 [ 'Col 1', 'Col 2' ],
1285 [ 'R1C1', 'R1C2' ],
1286 [ 'R2C1', 'R2C2' ]
1287 ]
1288
1289 Elements which only contain text content will simply be represented as
1290 a scalar. Where an element has both attributes and text content, the
1291 element will be represented as a hashref with the text content in the
1292 'content' key (see the "ContentKey" option):
1293
1294 <opt>
1295 <one>first</one>
1296 <two attr="value">second</two>
1297 </opt>
1298
1299 {
1300 'one' => 'first',
1301 'two' => { 'attr' => 'value', 'content' => 'second' }
1302 }
1303
1304 Mixed content (elements which contain both text content and nested
1305 elements) will be not be represented in a useful way - element order
1306 and significant whitespace will be lost. If you need to work with
1307 mixed content, then XML::Simple is not the right tool for your job -
1308 check out the next section.
1309
1311 XML::Simple is able to present a simple API because it makes some
1312 assumptions on your behalf. These include:
1313
1314 · You're not interested in text content consisting only of whitespace
1315
1316 · You don't mind that when things get slurped into a hash the order
1317 is lost
1318
1319 · You don't want fine-grained control of the formatting of generated
1320 XML
1321
1322 · You would never use a hash key that was not a legal XML element
1323 name
1324
1325 · You don't need help converting between different encodings
1326
1327 In a serious XML project, you'll probably outgrow these assumptions
1328 fairly quickly. This section of the document used to offer some advice
1329 on chosing a more powerful option. That advice has now grown into the
1330 'Perl-XML FAQ' document which you can find at:
1331 <http://perl-xml.sourceforge.net/faq/>
1332
1333 The advice in the FAQ boils down to a quick explanation of tree versus
1334 event based parsers and then recommends:
1335
1336 For event based parsing, use SAX (do not set out to write any new code
1337 for XML::Parser's handler API - it is obselete).
1338
1339 For tree-based parsing, you could choose between the 'Perlish' approach
1340 of XML::Twig and more standards based DOM implementations - preferably
1341 one with XPath support such as XML::LibXML.
1342
1344 XML::Simple requires either XML::Parser or XML::SAX.
1345
1346 To generate documents with namespaces, XML::NamespaceSupport is
1347 required.
1348
1349 The optional caching functions require Storable.
1350
1351 Answers to Frequently Asked Questions about XML::Simple are bundled
1352 with this distribution as: XML::Simple::FAQ
1353
1355 Copyright 1999-2004 Grant McLean <grantm@cpan.org>
1356
1357 This library is free software; you can redistribute it and/or modify it
1358 under the same terms as Perl itself.
1359
1360
1361
1362perl v5.16.3 2012-06-20 XML::Simple(3)