1XML::Simple(3) User Contributed Perl Documentation XML::Simple(3)
2
3
4
6 XML::Simple - Easy API to maintain XML (esp config files)
7
9 use XML::Simple;
10
11 my $ref = XMLin([<xml file or string>] [, <options>]);
12
13 my $xml = XMLout($hashref [, <options>]);
14
15 Or the object oriented way:
16
17 require XML::Simple;
18
19 my $xs = XML::Simple->new(options);
20
21 my $ref = $xs->XMLin([<xml file or string>] [, <options>]);
22
23 my $xml = $xs->XMLout($hashref [, <options>]);
24
25 (or see "SAX SUPPORT" for 'the SAX way').
26
27 To catch common errors:
28
29 use XML::Simple qw(:strict);
30
31 (see "STRICT MODE" for more details).
32
34 Say you have a script called foo and a file of configuration options
35 called foo.xml containing this:
36
37 <config logdir="/var/log/foo/" debugfile="/tmp/foo.debug">
38 <server name="sahara" osname="solaris" osversion="2.6">
39 <address>10.0.0.101</address>
40 <address>10.0.1.101</address>
41 </server>
42 <server name="gobi" osname="irix" osversion="6.5">
43 <address>10.0.0.102</address>
44 </server>
45 <server name="kalahari" osname="linux" osversion="2.0.34">
46 <address>10.0.0.103</address>
47 <address>10.0.1.103</address>
48 </server>
49 </config>
50
51 The following lines of code in foo:
52
53 use XML::Simple;
54
55 my $config = XMLin();
56
57 will 'slurp' the configuration options into the hashref $config
58 (because no arguments are passed to "XMLin()" the name and location of
59 the XML file will be inferred from name and location of the script).
60 You can dump out the contents of the hashref using Data::Dumper:
61
62 use Data::Dumper;
63
64 print Dumper($config);
65
66 which will produce something like this (formatting has been adjusted
67 for brevity):
68
69 {
70 'logdir' => '/var/log/foo/',
71 'debugfile' => '/tmp/foo.debug',
72 'server' => {
73 'sahara' => {
74 'osversion' => '2.6',
75 'osname' => 'solaris',
76 'address' => [ '10.0.0.101', '10.0.1.101' ]
77 },
78 'gobi' => {
79 'osversion' => '6.5',
80 'osname' => 'irix',
81 'address' => '10.0.0.102'
82 },
83 'kalahari' => {
84 'osversion' => '2.0.34',
85 'osname' => 'linux',
86 'address' => [ '10.0.0.103', '10.0.1.103' ]
87 }
88 }
89 }
90
91 Your script could then access the name of the log directory like this:
92
93 print $config->{logdir};
94
95 similarly, the second address on the server 'kalahari' could be refer‐
96 enced as:
97
98 print $config->{server}->{kalahari}->{address}->[1];
99
100 What could be simpler? (Rhetorical).
101
102 For simple requirements, that's really all there is to it. If you want
103 to store your XML in a different directory or file, or pass it in as a
104 string or even pass it in via some derivative of an IO::Handle, you'll
105 need to check out "OPTIONS". If you want to turn off or tweak the
106 array folding feature (that neat little transformation that produced
107 $config->{server}) you'll find options for that as well.
108
109 If you want to generate XML (for example to write a modified version of
110 $config back out as XML), check out "XMLout()".
111
112 If your needs are not so simple, this may not be the module for you.
113 In that case, you might want to read "WHERE TO FROM HERE?".
114
116 The XML::Simple module provides a simple API layer on top of an under‐
117 lying XML parsing module (either XML::Parser or one of the SAX2 parser
118 modules). Two functions are exported: "XMLin()" and "XMLout()". Note:
119 you can explicity request the lower case versions of the function
120 names: "xml_in()" and "xml_out()".
121
122 The simplest approach is to call these two functions directly, but an
123 optional object oriented interface (see "OPTIONAL OO INTERFACE" below)
124 allows them to be called as methods of an XML::Simple object. The
125 object interface can also be used at either end of a SAX pipeline.
126
127 XMLin()
128
129 Parses XML formatted data and returns a reference to a data structure
130 which contains the same information in a more readily accessible form.
131 (Skip down to "EXAMPLES" below, for more sample code).
132
133 "XMLin()" accepts an optional XML specifier followed by zero or more
134 'name => value' option pairs. The XML specifier can be one of the fol‐
135 lowing:
136
137 A filename
138 If the filename contains no directory components "XMLin()" will
139 look for the file in each directory in the SearchPath (see
140 "OPTIONS" below) or in the current directory if the SearchPath
141 option is not defined. eg:
142
143 $ref = XMLin('/etc/params.xml');
144
145 Note, the filename '-' can be used to parse from STDIN.
146
147 undef
148 If there is no XML specifier, "XMLin()" will check the script
149 directory and each of the SearchPath directories for a file with
150 the same name as the script but with the extension '.xml'. Note:
151 if you wish to specify options, you must specify the value 'undef'.
152 eg:
153
154 $ref = XMLin(undef, ForceArray => 1);
155
156 A string of XML
157 A string containing XML (recognised by the presence of '<' and '>'
158 characters) will be parsed directly. eg:
159
160 $ref = XMLin('<opt username="bob" password="flurp" />');
161
162 An IO::Handle object
163 An IO::Handle object will be read to EOF and its contents parsed.
164 eg:
165
166 $fh = IO::File->new('/etc/params.xml');
167 $ref = XMLin($fh);
168
169 XMLout()
170
171 Takes a data structure (generally a hashref) and returns an XML encod‐
172 ing of that structure. If the resulting XML is parsed using "XMLin()",
173 it should return a data structure equivalent to the original (see
174 caveats below).
175
176 The "XMLout()" function can also be used to output the XML as SAX
177 events see the "Handler" option and "SAX SUPPORT" for more details).
178
179 When translating hashes to XML, hash keys which have a leading '-' will
180 be silently skipped. This is the approved method for marking elements
181 of a data structure which should be ignored by "XMLout". (Note: If
182 these items were not skipped the key names would be emitted as element
183 or attribute names with a leading '-' which would not be valid XML).
184
185 Caveats
186
187 Some care is required in creating data structures which will be passed
188 to "XMLout()". Hash keys from the data structure will be encoded as
189 either XML element names or attribute names. Therefore, you should use
190 hash key names which conform to the relatively strict XML naming rules:
191
192 Names in XML must begin with a letter. The remaining characters may be
193 letters, digits, hyphens (-), underscores (_) or full stops (.). It is
194 also allowable to include one colon (:) in an element name but this
195 should only be used when working with namespaces (XML::Simple can only
196 usefully work with namespaces when teamed with a SAX Parser).
197
198 You can use other punctuation characters in hash values (just not in
199 hash keys) however XML::Simple does not support dumping binary data.
200
201 If you break these rules, the current implementation of "XMLout()" will
202 simply emit non-compliant XML which will be rejected if you try to read
203 it back in. (A later version of XML::Simple might take a more proac‐
204 tive approach).
205
206 Note also that although you can nest hashes and arrays to arbitrary
207 levels, circular data structures are not supported and will cause
208 "XMLout()" to die.
209
210 If you wish to 'round-trip' arbitrary data structures from Perl to XML
211 and back to Perl, then you should probably disable array folding (using
212 the KeyAttr option) both with "XMLout()" and with "XMLin()". If you
213 still don't get the expected results, you may prefer to use XML::Dumper
214 which is designed for exactly that purpose.
215
216 Refer to "WHERE TO FROM HERE?" if "XMLout()" is too simple for your
217 needs.
218
220 XML::Simple supports a number of options (in fact as each release of
221 XML::Simple adds more options, the module's claim to the name 'Simple'
222 becomes increasingly tenuous). If you find yourself repeatedly having
223 to specify the same options, you might like to investigate "OPTIONAL OO
224 INTERFACE" below.
225
226 If you can't be bothered reading the documentation, refer to "STRICT
227 MODE" to automatically catch common mistakes.
228
229 Because there are so many options, it's hard for new users to know
230 which ones are important, so here are the two you really need to know
231 about:
232
233 · check out "ForceArray" because you'll almost certainly want to turn
234 it on
235
236 · make sure you know what the "KeyAttr" option does and what its
237 default value is because it may surprise you otherwise (note in
238 particular that 'KeyAttr' affects both "XMLin" and "XMLout")
239
240 The option name headings below have a trailing 'comment' - a hash fol‐
241 lowed by two pieces of metadata:
242
243 · Options are marked with 'in' if they are recognised by "XMLin()"
244 and 'out' if they are recognised by "XMLout()".
245
246 · Each option is also flagged to indicate whether it is:
247
248 'important' - don't use the module until you understand this one
249 'handy' - you can skip this on the first time through
250 'advanced' - you can skip this on the second time through
251 'SAX only' - don't worry about this unless you're using SAX (or
252 alternatively if you need this, you also need SAX)
253 'seldom used' - you'll probably never use this unless you were the
254 person that requested the feature
255
256 The options are listed alphabetically:
257
258 Note: option names are no longer case sensitive so you can use the
259 mixed case versions shown here; all lower case as required by versions
260 2.03 and earlier; or you can add underscores between the words (eg:
261 key_attr).
262
263 AttrIndent => 1 # out - handy
264
265 When you are using "XMLout()", enable this option to have attributes
266 printed one-per-line with sensible indentation rather than all on one
267 line.
268
269 Cache => [ cache schemes ] # in - advanced
270
271 Because loading the XML::Parser module and parsing an XML file can con‐
272 sume a significant number of CPU cycles, it is often desirable to cache
273 the output of "XMLin()" for later reuse.
274
275 When parsing from a named file, XML::Simple supports a number of
276 caching schemes. The 'Cache' option may be used to specify one or more
277 schemes (using an anonymous array). Each scheme will be tried in turn
278 in the hope of finding a cached pre-parsed representation of the XML
279 file. If no cached copy is found, the file will be parsed and the
280 first cache scheme in the list will be used to save a copy of the
281 results. The following cache schemes have been implemented:
282
283 storable
284 Utilises Storable.pm to read/write a cache file with the same name
285 as the XML file but with the extension .stor
286
287 memshare
288 When a file is first parsed, a copy of the resulting data structure
289 is retained in memory in the XML::Simple module's namespace. Sub‐
290 sequent calls to parse the same file will return a reference to
291 this structure. This cached version will persist only for the life
292 of the Perl interpreter (which in the case of mod_perl for example,
293 may be some significant time).
294
295 Because each caller receives a reference to the same data struc‐
296 ture, a change made by one caller will be visible to all. For this
297 reason, the reference returned should be treated as read-only.
298
299 memcopy
300 This scheme works identically to 'memshare' (above) except that
301 each caller receives a reference to a new data structure which is a
302 copy of the cached version. Copying the data structure will add a
303 little processing overhead, therefore this scheme should only be
304 used where the caller intends to modify the data structure (or
305 wishes to protect itself from others who might). This scheme uses
306 Storable.pm to perform the copy.
307
308 Warning! The memory-based caching schemes compare the timestamp on the
309 file to the time when it was last parsed. If the file is stored on an
310 NFS filesystem (or other network share) and the clock on the file
311 server is not exactly synchronised with the clock where your script is
312 run, updates to the source XML file may appear to be ignored.
313
314 ContentKey => 'keyname' # in+out - seldom used
315
316 When text content is parsed to a hash value, this option let's you
317 specify a name for the hash key to override the default 'content'. So
318 for example:
319
320 XMLin('<opt one="1">Text</opt>', ContentKey => 'text')
321
322 will parse to:
323
324 { 'one' => 1, 'text' => 'Text' }
325
326 instead of:
327
328 { 'one' => 1, 'content' => 'Text' }
329
330 "XMLout()" will also honour the value of this option when converting a
331 hashref to XML.
332
333 You can also prefix your selected key name with a '-' character to have
334 "XMLin()" try a little harder to eliminate unnecessary 'content' keys
335 after array folding. For example:
336
337 XMLin(
338 '<opt><item name="one">First</item><item name="two">Second</item></opt>',
339 KeyAttr => {item => 'name'},
340 ForceArray => [ 'item' ],
341 ContentKey => '-content'
342 )
343
344 will parse to:
345
346 {
347 'item' => {
348 'one' => 'First'
349 'two' => 'Second'
350 }
351 }
352
353 rather than this (without the '-'):
354
355 {
356 'item' => {
357 'one' => { 'content' => 'First' }
358 'two' => { 'content' => 'Second' }
359 }
360 }
361
362 DataHandler => code_ref # in - SAX only
363
364 When you use an XML::Simple object as a SAX handler, it will return a
365 'simple tree' data structure in the same format as "XMLin()" would
366 return. If this option is set (to a subroutine reference), then when
367 the tree is built the subroutine will be called and passed two argu‐
368 ments: a reference to the XML::Simple object and a reference to the
369 data tree. The return value from the subroutine will be returned to
370 the SAX driver. (See "SAX SUPPORT" for more details).
371
372 ForceArray => 1 # in - important
373
374 This option should be set to '1' to force nested elements to be repre‐
375 sented as arrays even when there is only one. Eg, with ForceArray
376 enabled, this XML:
377
378 <opt>
379 <name>value</name>
380 </opt>
381
382 would parse to this:
383
384 {
385 'name' => [
386 'value'
387 ]
388 }
389
390 instead of this (the default):
391
392 {
393 'name' => 'value'
394 }
395
396 This option is especially useful if the data structure is likely to be
397 written back out as XML and the default behaviour of rolling single
398 nested elements up into attributes is not desirable.
399
400 If you are using the array folding feature, you should almost certainly
401 enable this option. If you do not, single nested elements will not be
402 parsed to arrays and therefore will not be candidates for folding to a
403 hash. (Given that the default value of 'KeyAttr' enables array fold‐
404 ing, the default value of this option should probably also have been
405 enabled too - sorry).
406
407 ForceArray => [ names ] # in - important
408
409 This alternative (and preferred) form of the 'ForceArray' option allows
410 you to specify a list of element names which should always be forced
411 into an array representation, rather than the 'all or nothing' approach
412 above.
413
414 It is also possible (since version 2.05) to include compiled regular
415 expressions in the list - any element names which match the pattern
416 will be forced to arrays. If the list contains only a single regex,
417 then it is not necessary to enclose it in an arrayref. Eg:
418
419 ForceArray => qr/_list$/
420
421 ForceContent => 1 # in - seldom used
422
423 When "XMLin()" parses elements which have text content as well as
424 attributes, the text content must be represented as a hash value rather
425 than a simple scalar. This option allows you to force text content to
426 always parse to a hash value even when there are no attributes. So for
427 example:
428
429 XMLin('<opt><x>text1</x><y a="2">text2</y></opt>', ForceContent => 1)
430
431 will parse to:
432
433 {
434 'x' => { 'content' => 'text1' },
435 'y' => { 'a' => 2, 'content' => 'text2' }
436 }
437
438 instead of:
439
440 {
441 'x' => 'text1',
442 'y' => { 'a' => 2, 'content' => 'text2' }
443 }
444
445 GroupTags => { grouping tag => grouped tag } # in+out - handy
446
447 You can use this option to eliminate extra levels of indirection in
448 your Perl data structure. For example this XML:
449
450 <opt>
451 <searchpath>
452 <dir>/usr/bin</dir>
453 <dir>/usr/local/bin</dir>
454 <dir>/usr/X11/bin</dir>
455 </searchpath>
456 </opt>
457
458 Would normally be read into a structure like this:
459
460 {
461 searchpath => {
462 dir => [ '/usr/bin', '/usr/local/bin', '/usr/X11/bin' ]
463 }
464 }
465
466 But when read in with the appropriate value for 'GroupTags':
467
468 my $opt = XMLin($xml, GroupTags => { searchpath => 'dir' });
469
470 It will return this simpler structure:
471
472 {
473 searchpath => [ '/usr/bin', '/usr/local/bin', '/usr/X11/bin' ]
474 }
475
476 The grouping element ("<searchpath>" in the example) must not contain
477 any attributes or elements other than the grouped element.
478
479 You can specify multiple 'grouping element' to 'grouped element' map‐
480 pings in the same hashref. If this option is combined with "KeyAttr",
481 the array folding will occur first and then the grouped element names
482 will be eliminated.
483
484 "XMLout" will also use the grouptag mappings to re-introduce the tags
485 around the grouped elements. Beware though that this will occur in all
486 places that the 'grouping tag' name occurs - you probably don't want to
487 use the same name for elements as well as attributes.
488
489 Handler => object_ref # out - SAX only
490
491 Use the 'Handler' option to have "XMLout()" generate SAX events rather
492 than returning a string of XML. For more details see "SAX SUPPORT"
493 below.
494
495 Note: the current implementation of this option generates a string of
496 XML and uses a SAX parser to translate it into SAX events. The normal
497 encoding rules apply here - your data must be UTF8 encoded unless you
498 specify an alternative encoding via the 'XMLDecl' option; and by the
499 time the data reaches the handler object, it will be in UTF8 form
500 regardless of the encoding you supply. A future implementation of this
501 option may generate the events directly.
502
503 KeepRoot => 1 # in+out - handy
504
505 In its attempt to return a data structure free of superfluous detail
506 and unnecessary levels of indirection, "XMLin()" normally discards the
507 root element name. Setting the 'KeepRoot' option to '1' will cause the
508 root element name to be retained. So after executing this code:
509
510 $config = XMLin('<config tempdir="/tmp" />', KeepRoot => 1)
511
512 You'll be able to reference the tempdir as "$config->{config}->{tem‐
513 pdir}" instead of the default "$config->{tempdir}".
514
515 Similarly, setting the 'KeepRoot' option to '1' will tell "XMLout()"
516 that the data structure already contains a root element name and it is
517 not necessary to add another.
518
519 KeyAttr => [ list ] # in+out - important
520
521 This option controls the 'array folding' feature which translates
522 nested elements from an array to a hash. It also controls the 'unfold‐
523 ing' of hashes to arrays.
524
525 For example, this XML:
526
527 <opt>
528 <user login="grep" fullname="Gary R Epstein" />
529 <user login="stty" fullname="Simon T Tyson" />
530 </opt>
531
532 would, by default, parse to this:
533
534 {
535 'user' => [
536 {
537 'login' => 'grep',
538 'fullname' => 'Gary R Epstein'
539 },
540 {
541 'login' => 'stty',
542 'fullname' => 'Simon T Tyson'
543 }
544 ]
545 }
546
547 If the option 'KeyAttr => "login"' were used to specify that the
548 'login' attribute is a key, the same XML would parse to:
549
550 {
551 'user' => {
552 'stty' => {
553 'fullname' => 'Simon T Tyson'
554 },
555 'grep' => {
556 'fullname' => 'Gary R Epstein'
557 }
558 }
559 }
560
561 The key attribute names should be supplied in an arrayref if there is
562 more than one. "XMLin()" will attempt to match attribute names in the
563 order supplied. "XMLout()" will use the first attribute name supplied
564 when 'unfolding' a hash into an array.
565
566 Note 1: The default value for 'KeyAttr' is ['name', 'key', 'id']. If
567 you do not want folding on input or unfolding on output you must set‐
568 ting this option to an empty list to disable the feature.
569
570 Note 2: If you wish to use this option, you should also enable the
571 "ForceArray" option. Without 'ForceArray', a single nested element
572 will be rolled up into a scalar rather than an array and therefore will
573 not be folded (since only arrays get folded).
574
575 KeyAttr => { list } # in+out - important
576
577 This alternative (and preferred) method of specifiying the key
578 attributes allows more fine grained control over which elements are
579 folded and on which attributes. For example the option 'KeyAttr => {
580 package => 'id' } will cause any package elements to be folded on the
581 'id' attribute. No other elements which have an 'id' attribute will be
582 folded at all.
583
584 Note: "XMLin()" will generate a warning (or a fatal error in "STRICT
585 MODE") if this syntax is used and an element which does not have the
586 specified key attribute is encountered (eg: a 'package' element without
587 an 'id' attribute, to use the example above). Warnings will only be
588 generated if -w is in force.
589
590 Two further variations are made possible by prefixing a '+' or a '-'
591 character to the attribute name:
592
593 The option 'KeyAttr => { user => "+login" }' will cause this XML:
594
595 <opt>
596 <user login="grep" fullname="Gary R Epstein" />
597 <user login="stty" fullname="Simon T Tyson" />
598 </opt>
599
600 to parse to this data structure:
601
602 {
603 'user' => {
604 'stty' => {
605 'fullname' => 'Simon T Tyson',
606 'login' => 'stty'
607 },
608 'grep' => {
609 'fullname' => 'Gary R Epstein',
610 'login' => 'grep'
611 }
612 }
613 }
614
615 The '+' indicates that the value of the key attribute should be copied
616 rather than moved to the folded hash key.
617
618 A '-' prefix would produce this result:
619
620 {
621 'user' => {
622 'stty' => {
623 'fullname' => 'Simon T Tyson',
624 '-login' => 'stty'
625 },
626 'grep' => {
627 'fullname' => 'Gary R Epstein',
628 '-login' => 'grep'
629 }
630 }
631 }
632
633 As described earlier, "XMLout" will ignore hash keys starting with a
634 '-'.
635
636 NoAttr => 1 # in+out - handy
637
638 When used with "XMLout()", the generated XML will contain no
639 attributes. All hash key/values will be represented as nested elements
640 instead.
641
642 When used with "XMLin()", any attributes in the XML will be ignored.
643
644 NoEscape => 1 # out - seldom used
645
646 By default, "XMLout()" will translate the characters '<', '>', '&' and
647 '"' to '<', '>', '&' and '"' respectively. Use this
648 option to suppress escaping (presumably because you've already escaped
649 the data in some more sophisticated manner).
650
651 NoIndent => 1 # out - seldom used
652
653 Set this option to 1 to disable "XMLout()"'s default 'pretty printing'
654 mode. With this option enabled, the XML output will all be on one line
655 (unless there are newlines in the data) - this may be easier for down‐
656 stream processing.
657
658 NoSort => 1 # out - seldom used
659
660 Newer versions of XML::Simple sort elements and attributes alphabeti‐
661 cally (*), by default. Enable this option to suppress the sorting -
662 possibly for backwards compatibility.
663
664 * Actually, sorting is alphabetical but 'key' attribute or element
665 names (as in 'KeyAttr') sort first. Also, when a hash of hashes is
666 'unfolded', the elements are sorted alphabetically by the value of the
667 key field.
668
669 NormaliseSpace => 0 ⎪ 1 ⎪ 2 # in - handy
670
671 This option controls how whitespace in text content is handled. Recog‐
672 nised values for the option are:
673
674 · 0 = (default) whitespace is passed through unaltered (except of
675 course for the normalisation of whitespace in attribute values
676 which is mandated by the XML recommendation)
677
678 · 1 = whitespace is normalised in any value used as a hash key (nor‐
679 malising means removing leading and trailing whitespace and col‐
680 lapsing sequences of whitespace characters to a single space)
681
682 · 2 = whitespace is normalised in all text content
683
684 Note: you can spell this option with a 'z' if that is more natural for
685 you.
686
687 NSExpand => 1 # in+out handy - SAX only
688
689 This option controls namespace expansion - the translation of element
690 and attribute names of the form 'prefix:name' to '{uri}name'. For
691 example the element name 'xsl:template' might be expanded to:
692 '{http://www.w3.org/1999/XSL/Transform}template'.
693
694 By default, "XMLin()" will return element names and attribute names
695 exactly as they appear in the XML. Setting this option to 1 will cause
696 all element and attribute names to be expanded to include their names‐
697 pace prefix.
698
699 Note: You must be using a SAX parser for this option to work (ie: it
700 does not work with XML::Parser).
701
702 This option also controls whether "XMLout()" performs the reverse
703 translation from '{uri}name' back to 'prefix:name'. The default is no
704 translation. If your data contains expanded names, you should set this
705 option to 1 otherwise "XMLout" will emit XML which is not well formed.
706
707 Note: You must have the XML::NamespaceSupport module installed if you
708 want "XMLout()" to translate URIs back to prefixes.
709
710 NumericEscape => 0 ⎪ 1 ⎪ 2 # out - handy
711
712 Use this option to have 'high' (non-ASCII) characters in your Perl data
713 structure converted to numeric entities (eg: €) in the XML out‐
714 put. Three levels are possible:
715
716 0 - default: no numeric escaping (OK if you're writing out UTF8)
717
718 1 - only characters above 0xFF are escaped (ie: characters in the
719 0x80-FF range are not escaped), possibly useful with ISO8859-1 output
720
721 2 - all characters above 0x7F are escaped (good for plain ASCII output)
722
723 OutputFile => <file specifier> # out - handy
724
725 The default behaviour of "XMLout()" is to return the XML as a string.
726 If you wish to write the XML to a file, simply supply the filename
727 using the 'OutputFile' option.
728
729 This option also accepts an IO handle object - especially useful in
730 Perl 5.8.0 and later for output using an encoding other than UTF-8, eg:
731
732 open my $fh, '>:encoding(iso-8859-1)', $path or die "open($path): $!";
733 XMLout($ref, OutputFile => $fh);
734
735 Note, XML::Simple does not require that the object you pass in to the
736 OutputFile option inherits from IO::Handle - it simply assumes the
737 object supports a "print" method.
738
739 ParserOpts => [ XML::Parser Options ] # in - don't use this
740
741 Note: This option is now officially deprecated. If you find it useful,
742 email the author with an example of what you use it for. Do not use
743 this option to set the ProtocolEncoding, that's just plain wrong - fix
744 the XML.
745
746 This option allows you to pass parameters to the constructor of the
747 underlying XML::Parser object (which of course assumes you're not using
748 SAX).
749
750 RootName => 'string' # out - handy
751
752 By default, when "XMLout()" generates XML, the root element will be
753 named 'opt'. This option allows you to specify an alternative name.
754
755 Specifying either undef or the empty string for the RootName option
756 will produce XML with no root elements. In most cases the resulting
757 XML fragment will not be 'well formed' and therefore could not be read
758 back in by "XMLin()". Nevertheless, the option has been found to be
759 useful in certain circumstances.
760
761 SearchPath => [ list ] # in - handy
762
763 If you pass "XMLin()" a filename, but the filename include no directory
764 component, you can use this option to specify which directories should
765 be searched to locate the file. You might use this option to search
766 first in the user's home directory, then in a global directory such as
767 /etc.
768
769 If a filename is provided to "XMLin()" but SearchPath is not defined,
770 the file is assumed to be in the current directory.
771
772 If the first parameter to "XMLin()" is undefined, the default Search‐
773 Path will contain only the directory in which the script itself is
774 located. Otherwise the default SearchPath will be empty.
775
776 SuppressEmpty => 1 ⎪ '' ⎪ undef # in+out - handy
777
778 This option controls what "XMLin()" should do with empty elements (no
779 attributes and no content). The default behaviour is to represent them
780 as empty hashes. Setting this option to a true value (eg: 1) will
781 cause empty elements to be skipped altogether. Setting the option to
782 'undef' or the empty string will cause empty elements to be represented
783 as the undefined value or the empty string respectively. The latter
784 two alternatives are a little easier to test for in your code than a
785 hash with no keys.
786
787 The option also controls what "XMLout()" does with undefined values.
788 Setting the option to undef causes undefined values to be output as
789 empty elements (rather than empty attributes), it also suppresses the
790 generation of warnings about undefined values. Setting the option to a
791 true value (eg: 1) causes undefined values to be skipped altogether on
792 output.
793
794 ValueAttr => [ names ] # in - handy
795
796 Use this option to deal elements which always have a single attribute
797 and no content. Eg:
798
799 <opt>
800 <colour value="red" />
801 <size value="XXL" />
802 </opt>
803
804 Setting "ValueAttr => [ 'value' ]" will cause the above XML to parse
805 to:
806
807 {
808 colour => 'red',
809 size => 'XXL'
810 }
811
812 instead of this (the default):
813
814 {
815 colour => { value => 'red' },
816 size => { value => 'XXL' }
817 }
818
819 Note: This form of the ValueAttr option is not compatible with
820 "XMLout()" - since the attribute name is discarded at parse time, the
821 original XML cannot be reconstructed.
822
823 ValueAttr => { element => attribute, ... } # in+out - handy
824
825 This (preferred) form of the ValueAttr option requires you to specify
826 both the element and the attribute names. This is not only safer, it
827 also allows the original XML to be reconstructed by "XMLout()".
828
829 Note: You probably don't want to use this option and the NoAttr option
830 at the same time.
831
832 Variables => { name => value } # in - handy
833
834 This option allows variables in the XML to be expanded when the file is
835 read. (there is no facility for putting the variable names back if you
836 regenerate XML using "XMLout").
837
838 A 'variable' is any text of the form "${name}" which occurs in an
839 attribute value or in the text content of an element. If 'name'
840 matches a key in the supplied hashref, "${name}" will be replaced with
841 the corresponding value from the hashref. If no matching key is found,
842 the variable will not be replaced. Names must match the regex:
843 "[\w.]+" (ie: only 'word' characters and dots are allowed).
844
845 VarAttr => 'attr_name' # in - handy
846
847 In addition to the variables defined using "Variables", this option
848 allows variables to be defined in the XML. A variable definition con‐
849 sists of an element with an attribute called 'attr_name' (the value of
850 the "VarAttr" option). The value of the attribute will be used as the
851 variable name and the text content of the element will be used as the
852 value. A variable defined in this way will override a variable defined
853 using the "Variables" option. For example:
854
855 XMLin( '<opt>
856 <dir name="prefix">/usr/local/apache</dir>
857 <dir name="exec_prefix">${prefix}</dir>
858 <dir name="bindir">${exec_prefix}/bin</dir>
859 </opt>',
860 VarAttr => 'name', ContentKey => '-content'
861 );
862
863 produces the following data structure:
864
865 {
866 dir => {
867 prefix => '/usr/local/apache',
868 exec_prefix => '/usr/local/apache',
869 bindir => '/usr/local/apache/bin',
870 }
871 }
872
873 XMLDecl => 1 or XMLDecl => 'string' # out - handy
874
875 If you want the output from "XMLout()" to start with the optional XML
876 declaration, simply set the option to '1'. The default XML declaration
877 is:
878
879 <?xml version='1.0' standalone='yes'?>
880
881 If you want some other string (for example to declare an encoding
882 value), set the value of this option to the complete string you
883 require.
884
886 The procedural interface is both simple and convenient however there
887 are a couple of reasons why you might prefer to use the object oriented
888 (OO) interface:
889
890 · to define a set of default values which should be used on all sub‐
891 sequent calls to "XMLin()" or "XMLout()"
892
893 · to override methods in XML::Simple to provide customised behaviour
894
895 The default values for the options described above are unlikely to suit
896 everyone. The OO interface allows you to effectively override
897 XML::Simple's defaults with your preferred values. It works like this:
898
899 First create an XML::Simple parser object with your preferred defaults:
900
901 my $xs = XML::Simple->new(ForceArray => 1, KeepRoot => 1);
902
903 then call "XMLin()" or "XMLout()" as a method of that object:
904
905 my $ref = $xs->XMLin($xml);
906 my $xml = $xs->XMLout($ref);
907
908 You can also specify options when you make the method calls and these
909 values will be merged with the values specified when the object was
910 created. Values specified in a method call take precedence.
911
912 Overriding methods is a more advanced topic but might be useful if for
913 example you wished to provide an alternative routine for escaping char‐
914 acter data (the escape_value method) or for building the initial parse
915 tree (the build_tree method).
916
917 Note: when called as methods, the "XMLin()" and "XMLout()" routines may
918 be called as "xml_in()" or "xml_out()". The method names are aliased
919 so the only difference is the aesthetics.
920
922 If you import the XML::Simple routines like this:
923
924 use XML::Simple qw(:strict);
925
926 the following common mistakes will be detected and treated as fatal
927 errors
928
929 · Failing to explicitly set the "KeyAttr" option - if you can't be
930 bothered reading about this option, turn it off with: KeyAttr => [
931 ]
932
933 · Failing to explicitly set the "ForceArray" option - if you can't be
934 bothered reading about this option, set it to the safest mode with:
935 ForceArray => 1
936
937 · Setting ForceArray to an array, but failing to list all the ele‐
938 ments from the KeyAttr hash.
939
940 · Data error - KeyAttr is set to say { part => 'partnum' } but the
941 XML contains one or more <part> elements without a 'partnum'
942 attribute (or nested element). Note: if strict mode is not set but
943 -w is, this condition triggers a warning.
944
945 · Data error - as above, but value of key attribute (eg: partnum) is
946 not a scalar string (due to nested elements etc). This will also
947 trigger a warning if strict mode is not enabled.
948
950 From version 1.08_01, XML::Simple includes support for SAX (the Simple
951 API for XML) - specifically SAX2.
952
953 In a typical SAX application, an XML parser (or SAX 'driver') module
954 generates SAX events (start of element, character data, end of element,
955 etc) as it parses an XML document and a 'handler' module processes the
956 events to extract the required data. This simple model allows for some
957 interesting and powerful possibilities:
958
959 · Applications written to the SAX API can extract data from huge XML
960 documents without the memory overheads of a DOM or tree API.
961
962 · The SAX API allows for plug and play interchange of parser modules
963 without having to change your code to fit a new module's API. A
964 number of SAX parsers are available with capabilities ranging from
965 extreme portability to blazing performance.
966
967 · A SAX 'filter' module can implement both a handler interface for
968 receiving data and a generator interface for passing modified data
969 on to a downstream handler. Filters can be chained together in
970 'pipelines'.
971
972 · One filter module might split a data stream to direct data to two
973 or more downstream handlers.
974
975 · Generating SAX events is not the exclusive preserve of XML parsing
976 modules. For example, a module might extract data from a rela‐
977 tional database using DBI and pass it on to a SAX pipeline for fil‐
978 tering and formatting.
979
980 XML::Simple can operate at either end of a SAX pipeline. For example,
981 you can take a data structure in the form of a hashref and pass it into
982 a SAX pipeline using the 'Handler' option on "XMLout()":
983
984 use XML::Simple;
985 use Some::SAX::Filter;
986 use XML::SAX::Writer;
987
988 my $ref = {
989 .... # your data here
990 };
991
992 my $writer = XML::SAX::Writer->new();
993 my $filter = Some::SAX::Filter->new(Handler => $writer);
994 my $simple = XML::Simple->new(Handler => $filter);
995 $simple->XMLout($ref);
996
997 You can also put XML::Simple at the opposite end of the pipeline to
998 take advantage of the simple 'tree' data structure once the relevant
999 data has been isolated through filtering:
1000
1001 use XML::SAX;
1002 use Some::SAX::Filter;
1003 use XML::Simple;
1004
1005 my $simple = XML::Simple->new(ForceArray => 1, KeyAttr => ['partnum']);
1006 my $filter = Some::SAX::Filter->new(Handler => $simple);
1007 my $parser = XML::SAX::ParserFactory->parser(Handler => $filter);
1008
1009 my $ref = $parser->parse_uri('some_huge_file.xml');
1010
1011 print $ref->{part}->{'555-1234'};
1012
1013 You can build a filter by using an XML::Simple object as a handler and
1014 setting its DataHandler option to point to a routine which takes the
1015 resulting tree, modifies it and sends it off as SAX events to a down‐
1016 stream handler:
1017
1018 my $writer = XML::SAX::Writer->new();
1019 my $filter = XML::Simple->new(
1020 DataHandler => sub {
1021 my $simple = shift;
1022 my $data = shift;
1023
1024 # Modify $data here
1025
1026 $simple->XMLout($data, Handler => $writer);
1027 }
1028 );
1029 my $parser = XML::SAX::ParserFactory->parser(Handler => $filter);
1030
1031 $parser->parse_uri($filename);
1032
1033 Note: In this last example, the 'Handler' option was specified in the
1034 call to "XMLout()" but it could also have been specified in the con‐
1035 structor.
1036
1038 If you don't care which parser module XML::Simple uses then skip this
1039 section entirely (it looks more complicated than it really is).
1040
1041 XML::Simple will default to using a SAX parser if one is available or
1042 XML::Parser if SAX is not available.
1043
1044 You can dictate which parser module is used by setting either the envi‐
1045 ronment variable 'XML_SIMPLE_PREFERRED_PARSER' or the package variable
1046 $XML::Simple::PREFERRED_PARSER to contain the module name. The follow‐
1047 ing rules are used:
1048
1049 · The package variable takes precedence over the environment variable
1050 if both are defined. To force XML::Simple to ignore the environ‐
1051 ment settings and use its default rules, you can set the package
1052 variable to an empty string.
1053
1054 · If the 'preferred parser' is set to the string 'XML::Parser', then
1055 XML::Parser will be used (or "XMLin()" will die if XML::Parser is
1056 not installed).
1057
1058 · If the 'preferred parser' is set to some other value, then it is
1059 assumed to be the name of a SAX parser module and is passed to
1060 XML::SAX::ParserFactory. If XML::SAX is not installed, or the
1061 requested parser module is not installed, then "XMLin()" will die.
1062
1063 · If the 'preferred parser' is not defined at all (the normal default
1064 state), an attempt will be made to load XML::SAX. If XML::SAX is
1065 installed, then a parser module will be selected according to
1066 XML::SAX::ParserFactory's normal rules (which typically means the
1067 last SAX parser installed).
1068
1069 · if the 'preferred parser' is not defined and XML::SAX is not
1070 installed, then XML::Parser will be used. "XMLin()" will die if
1071 XML::Parser is not installed.
1072
1073 Note: The XML::SAX distribution includes an XML parser written entirely
1074 in Perl. It is very portable but it is not very fast. You should con‐
1075 sider installing XML::LibXML or XML::SAX::Expat if they are available
1076 for your platform.
1077
1079 The XML standard is very clear on the issue of non-compliant documents.
1080 An error in parsing any single element (for example a missing end tag)
1081 must cause the whole document to be rejected. XML::Simple will die
1082 with an appropriate message if it encounters a parsing error.
1083
1084 If dying is not appropriate for your application, you should arrange to
1085 call "XMLin()" in an eval block and look for errors in $@. eg:
1086
1087 my $config = eval { XMLin() };
1088 PopUpMessage($@) if($@);
1089
1090 Note, there is a common misconception that use of eval will signifi‐
1091 cantly slow down a script. While that may be true when the code being
1092 eval'd is in a string, it is not true of code like the sample above.
1093
1095 When "XMLin()" reads the following very simple piece of XML:
1096
1097 <opt username="testuser" password="frodo"></opt>
1098
1099 it returns the following data structure:
1100
1101 {
1102 'username' => 'testuser',
1103 'password' => 'frodo'
1104 }
1105
1106 The identical result could have been produced with this alternative
1107 XML:
1108
1109 <opt username="testuser" password="frodo" />
1110
1111 Or this (although see 'ForceArray' option for variations):
1112
1113 <opt>
1114 <username>testuser</username>
1115 <password>frodo</password>
1116 </opt>
1117
1118 Repeated nested elements are represented as anonymous arrays:
1119
1120 <opt>
1121 <person firstname="Joe" lastname="Smith">
1122 <email>joe@smith.com</email>
1123 <email>jsmith@yahoo.com</email>
1124 </person>
1125 <person firstname="Bob" lastname="Smith">
1126 <email>bob@smith.com</email>
1127 </person>
1128 </opt>
1129
1130 {
1131 'person' => [
1132 {
1133 'email' => [
1134 'joe@smith.com',
1135 'jsmith@yahoo.com'
1136 ],
1137 'firstname' => 'Joe',
1138 'lastname' => 'Smith'
1139 },
1140 {
1141 'email' => 'bob@smith.com',
1142 'firstname' => 'Bob',
1143 'lastname' => 'Smith'
1144 }
1145 ]
1146 }
1147
1148 Nested elements with a recognised key attribute are transformed
1149 (folded) from an array into a hash keyed on the value of that attribute
1150 (see the "KeyAttr" option):
1151
1152 <opt>
1153 <person key="jsmith" firstname="Joe" lastname="Smith" />
1154 <person key="tsmith" firstname="Tom" lastname="Smith" />
1155 <person key="jbloggs" firstname="Joe" lastname="Bloggs" />
1156 </opt>
1157
1158 {
1159 'person' => {
1160 'jbloggs' => {
1161 'firstname' => 'Joe',
1162 'lastname' => 'Bloggs'
1163 },
1164 'tsmith' => {
1165 'firstname' => 'Tom',
1166 'lastname' => 'Smith'
1167 },
1168 'jsmith' => {
1169 'firstname' => 'Joe',
1170 'lastname' => 'Smith'
1171 }
1172 }
1173 }
1174
1175 The <anon> tag can be used to form anonymous arrays:
1176
1177 <opt>
1178 <head><anon>Col 1</anon><anon>Col 2</anon><anon>Col 3</anon></head>
1179 <data><anon>R1C1</anon><anon>R1C2</anon><anon>R1C3</anon></data>
1180 <data><anon>R2C1</anon><anon>R2C2</anon><anon>R2C3</anon></data>
1181 <data><anon>R3C1</anon><anon>R3C2</anon><anon>R3C3</anon></data>
1182 </opt>
1183
1184 {
1185 'head' => [
1186 [ 'Col 1', 'Col 2', 'Col 3' ]
1187 ],
1188 'data' => [
1189 [ 'R1C1', 'R1C2', 'R1C3' ],
1190 [ 'R2C1', 'R2C2', 'R2C3' ],
1191 [ 'R3C1', 'R3C2', 'R3C3' ]
1192 ]
1193 }
1194
1195 Anonymous arrays can be nested to arbirtrary levels and as a special
1196 case, if the surrounding tags for an XML document contain only an
1197 anonymous array the arrayref will be returned directly rather than the
1198 usual hashref:
1199
1200 <opt>
1201 <anon><anon>Col 1</anon><anon>Col 2</anon></anon>
1202 <anon><anon>R1C1</anon><anon>R1C2</anon></anon>
1203 <anon><anon>R2C1</anon><anon>R2C2</anon></anon>
1204 </opt>
1205
1206 [
1207 [ 'Col 1', 'Col 2' ],
1208 [ 'R1C1', 'R1C2' ],
1209 [ 'R2C1', 'R2C2' ]
1210 ]
1211
1212 Elements which only contain text content will simply be represented as
1213 a scalar. Where an element has both attributes and text content, the
1214 element will be represented as a hashref with the text content in the
1215 'content' key (see the "ContentKey" option):
1216
1217 <opt>
1218 <one>first</one>
1219 <two attr="value">second</two>
1220 </opt>
1221
1222 {
1223 'one' => 'first',
1224 'two' => { 'attr' => 'value', 'content' => 'second' }
1225 }
1226
1227 Mixed content (elements which contain both text content and nested ele‐
1228 ments) will be not be represented in a useful way - element order and
1229 significant whitespace will be lost. If you need to work with mixed
1230 content, then XML::Simple is not the right tool for your job - check
1231 out the next section.
1232
1234 XML::Simple is able to present a simple API because it makes some
1235 assumptions on your behalf. These include:
1236
1237 · You're not interested in text content consisting only of whitespace
1238
1239 · You don't mind that when things get slurped into a hash the order
1240 is lost
1241
1242 · You don't want fine-grained control of the formatting of generated
1243 XML
1244
1245 · You would never use a hash key that was not a legal XML element
1246 name
1247
1248 · You don't need help converting between different encodings
1249
1250 In a serious XML project, you'll probably outgrow these assumptions
1251 fairly quickly. This section of the document used to offer some advice
1252 on chosing a more powerful option. That advice has now grown into the
1253 'Perl-XML FAQ' document which you can find at: <http://perl-xml.source‐
1254 forge.net/faq/>
1255
1256 The advice in the FAQ boils down to a quick explanation of tree versus
1257 event based parsers and then recommends:
1258
1259 For event based parsing, use SAX (do not set out to write any new code
1260 for XML::Parser's handler API - it is obselete).
1261
1262 For tree-based parsing, you could choose between the 'Perlish' approach
1263 of XML::Twig and more standards based DOM implementations - preferably
1264 one with XPath support.
1265
1267 XML::Simple requires either XML::Parser or XML::SAX.
1268
1269 To generate documents with namespaces, XML::NamespaceSupport is
1270 required.
1271
1272 The optional caching functions require Storable.
1273
1274 Answers to Frequently Asked Questions about XML::Simple are bundled
1275 with this distribution as: XML::Simple::FAQ
1276
1278 Copyright 1999-2004 Grant McLean <grantm@cpan.org>
1279
1280 This library is free software; you can redistribute it and/or modify it
1281 under the same terms as Perl itself.
1282
1283
1284
1285perl v5.8.8 2004-11-19 XML::Simple(3)