1XML::Simple(3) User Contributed Perl Documentation XML::Simple(3)
2
3
4
6 XML::Simple - Easy API to maintain XML (esp config files)
7
9 use XML::Simple;
10
11 my $ref = XMLin([<xml file or string>] [, <options>]);
12
13 my $xml = XMLout($hashref [, <options>]);
14
15 Or the object oriented way:
16
17 require XML::Simple;
18
19 my $xs = XML::Simple->new(options);
20
21 my $ref = $xs->XMLin([<xml file or string>] [, <options>]);
22
23 my $xml = $xs->XMLout($hashref [, <options>]);
24
25 (or see "SAX SUPPORT" for 'the SAX way').
26
27 To catch common errors:
28
29 use XML::Simple qw(:strict);
30
31 (see "STRICT MODE" for more details).
32
34 Say you have a script called foo and a file of configuration options
35 called foo.xml containing this:
36
37 <config logdir="/var/log/foo/" debugfile="/tmp/foo.debug">
38 <server name="sahara" osname="solaris" osversion="2.6">
39 <address>10.0.0.101</address>
40 <address>10.0.1.101</address>
41 </server>
42 <server name="gobi" osname="irix" osversion="6.5">
43 <address>10.0.0.102</address>
44 </server>
45 <server name="kalahari" osname="linux" osversion="2.0.34">
46 <address>10.0.0.103</address>
47 <address>10.0.1.103</address>
48 </server>
49 </config>
50
51 The following lines of code in foo:
52
53 use XML::Simple;
54
55 my $config = XMLin();
56
57 will 'slurp' the configuration options into the hashref $config
58 (because no arguments are passed to "XMLin()" the name and location of
59 the XML file will be inferred from name and location of the script).
60 You can dump out the contents of the hashref using Data::Dumper:
61
62 use Data::Dumper;
63
64 print Dumper($config);
65
66 which will produce something like this (formatting has been adjusted
67 for brevity):
68
69 {
70 'logdir' => '/var/log/foo/',
71 'debugfile' => '/tmp/foo.debug',
72 'server' => {
73 'sahara' => {
74 'osversion' => '2.6',
75 'osname' => 'solaris',
76 'address' => [ '10.0.0.101', '10.0.1.101' ]
77 },
78 'gobi' => {
79 'osversion' => '6.5',
80 'osname' => 'irix',
81 'address' => '10.0.0.102'
82 },
83 'kalahari' => {
84 'osversion' => '2.0.34',
85 'osname' => 'linux',
86 'address' => [ '10.0.0.103', '10.0.1.103' ]
87 }
88 }
89 }
90
91 Your script could then access the name of the log directory like this:
92
93 print $config->{logdir};
94
95 similarly, the second address on the server 'kalahari' could be
96 referenced as:
97
98 print $config->{server}->{kalahari}->{address}->[1];
99
100 What could be simpler? (Rhetorical).
101
102 For simple requirements, that's really all there is to it. If you want
103 to store your XML in a different directory or file, or pass it in as a
104 string or even pass it in via some derivative of an IO::Handle, you'll
105 need to check out "OPTIONS". If you want to turn off or tweak the
106 array folding feature (that neat little transformation that produced
107 $config->{server}) you'll find options for that as well.
108
109 If you want to generate XML (for example to write a modified version of
110 $config back out as XML), check out "XMLout()".
111
112 If your needs are not so simple, this may not be the module for you.
113 In that case, you might want to read "WHERE TO FROM HERE?".
114
116 The XML::Simple module provides a simple API layer on top of an
117 underlying XML parsing module (either XML::Parser or one of the SAX2
118 parser modules). Two functions are exported: "XMLin()" and "XMLout()".
119 Note: you can explicity request the lower case versions of the function
120 names: "xml_in()" and "xml_out()".
121
122 The simplest approach is to call these two functions directly, but an
123 optional object oriented interface (see "OPTIONAL OO INTERFACE" below)
124 allows them to be called as methods of an XML::Simple object. The
125 object interface can also be used at either end of a SAX pipeline.
126
127 XMLin()
128 Parses XML formatted data and returns a reference to a data structure
129 which contains the same information in a more readily accessible form.
130 (Skip down to "EXAMPLES" below, for more sample code).
131
132 "XMLin()" accepts an optional XML specifier followed by zero or more
133 'name => value' option pairs. The XML specifier can be one of the
134 following:
135
136 A filename
137 If the filename contains no directory components "XMLin()" will
138 look for the file in each directory in the SearchPath (see
139 "OPTIONS" below) or in the current directory if the SearchPath
140 option is not defined. eg:
141
142 $ref = XMLin('/etc/params.xml');
143
144 Note, the filename '-' can be used to parse from STDIN.
145
146 undef
147 If there is no XML specifier, "XMLin()" will check the script
148 directory and each of the SearchPath directories for a file with
149 the same name as the script but with the extension '.xml'. Note:
150 if you wish to specify options, you must specify the value 'undef'.
151 eg:
152
153 $ref = XMLin(undef, ForceArray => 1);
154
155 A string of XML
156 A string containing XML (recognised by the presence of '<' and '>'
157 characters) will be parsed directly. eg:
158
159 $ref = XMLin('<opt username="bob" password="flurp" />');
160
161 An IO::Handle object
162 An IO::Handle object will be read to EOF and its contents parsed.
163 eg:
164
165 $fh = IO::File->new('/etc/params.xml');
166 $ref = XMLin($fh);
167
168 XMLout()
169 Takes a data structure (generally a hashref) and returns an XML
170 encoding of that structure. If the resulting XML is parsed using
171 "XMLin()", it should return a data structure equivalent to the original
172 (see caveats below).
173
174 The "XMLout()" function can also be used to output the XML as SAX
175 events see the "Handler" option and "SAX SUPPORT" for more details).
176
177 When translating hashes to XML, hash keys which have a leading '-' will
178 be silently skipped. This is the approved method for marking elements
179 of a data structure which should be ignored by "XMLout". (Note: If
180 these items were not skipped the key names would be emitted as element
181 or attribute names with a leading '-' which would not be valid XML).
182
183 Caveats
184 Some care is required in creating data structures which will be passed
185 to "XMLout()". Hash keys from the data structure will be encoded as
186 either XML element names or attribute names. Therefore, you should use
187 hash key names which conform to the relatively strict XML naming rules:
188
189 Names in XML must begin with a letter. The remaining characters may be
190 letters, digits, hyphens (-), underscores (_) or full stops (.). It is
191 also allowable to include one colon (:) in an element name but this
192 should only be used when working with namespaces (XML::Simple can only
193 usefully work with namespaces when teamed with a SAX Parser).
194
195 You can use other punctuation characters in hash values (just not in
196 hash keys) however XML::Simple does not support dumping binary data.
197
198 If you break these rules, the current implementation of "XMLout()" will
199 simply emit non-compliant XML which will be rejected if you try to read
200 it back in. (A later version of XML::Simple might take a more
201 proactive approach).
202
203 Note also that although you can nest hashes and arrays to arbitrary
204 levels, circular data structures are not supported and will cause
205 "XMLout()" to die.
206
207 If you wish to 'round-trip' arbitrary data structures from Perl to XML
208 and back to Perl, then you should probably disable array folding (using
209 the KeyAttr option) both with "XMLout()" and with "XMLin()". If you
210 still don't get the expected results, you may prefer to use XML::Dumper
211 which is designed for exactly that purpose.
212
213 Refer to "WHERE TO FROM HERE?" if "XMLout()" is too simple for your
214 needs.
215
217 XML::Simple supports a number of options (in fact as each release of
218 XML::Simple adds more options, the module's claim to the name 'Simple'
219 becomes increasingly tenuous). If you find yourself repeatedly having
220 to specify the same options, you might like to investigate "OPTIONAL OO
221 INTERFACE" below.
222
223 If you can't be bothered reading the documentation, refer to "STRICT
224 MODE" to automatically catch common mistakes.
225
226 Because there are so many options, it's hard for new users to know
227 which ones are important, so here are the two you really need to know
228 about:
229
230 · check out "ForceArray" because you'll almost certainly want to turn
231 it on
232
233 · make sure you know what the "KeyAttr" option does and what its
234 default value is because it may surprise you otherwise (note in
235 particular that 'KeyAttr' affects both "XMLin" and "XMLout")
236
237 The option name headings below have a trailing 'comment' - a hash
238 followed by two pieces of metadata:
239
240 · Options are marked with 'in' if they are recognised by "XMLin()"
241 and 'out' if they are recognised by "XMLout()".
242
243 · Each option is also flagged to indicate whether it is:
244
245 'important' - don't use the module until you understand this one
246 'handy' - you can skip this on the first time through
247 'advanced' - you can skip this on the second time through
248 'SAX only' - don't worry about this unless you're using SAX (or
249 alternatively if you need this, you also need SAX)
250 'seldom used' - you'll probably never use this unless you were the
251 person that requested the feature
252
253 The options are listed alphabetically:
254
255 Note: option names are no longer case sensitive so you can use the
256 mixed case versions shown here; all lower case as required by versions
257 2.03 and earlier; or you can add underscores between the words (eg:
258 key_attr).
259
260 AttrIndent => 1 # out - handy
261 When you are using "XMLout()", enable this option to have attributes
262 printed one-per-line with sensible indentation rather than all on one
263 line.
264
265 Cache => [ cache schemes ] # in - advanced
266 Because loading the XML::Parser module and parsing an XML file can
267 consume a significant number of CPU cycles, it is often desirable to
268 cache the output of "XMLin()" for later reuse.
269
270 When parsing from a named file, XML::Simple supports a number of
271 caching schemes. The 'Cache' option may be used to specify one or more
272 schemes (using an anonymous array). Each scheme will be tried in turn
273 in the hope of finding a cached pre-parsed representation of the XML
274 file. If no cached copy is found, the file will be parsed and the
275 first cache scheme in the list will be used to save a copy of the
276 results. The following cache schemes have been implemented:
277
278 storable
279 Utilises Storable.pm to read/write a cache file with the same name
280 as the XML file but with the extension .stor
281
282 memshare
283 When a file is first parsed, a copy of the resulting data structure
284 is retained in memory in the XML::Simple module's namespace.
285 Subsequent calls to parse the same file will return a reference to
286 this structure. This cached version will persist only for the life
287 of the Perl interpreter (which in the case of mod_perl for example,
288 may be some significant time).
289
290 Because each caller receives a reference to the same data
291 structure, a change made by one caller will be visible to all. For
292 this reason, the reference returned should be treated as read-only.
293
294 memcopy
295 This scheme works identically to 'memshare' (above) except that
296 each caller receives a reference to a new data structure which is a
297 copy of the cached version. Copying the data structure will add a
298 little processing overhead, therefore this scheme should only be
299 used where the caller intends to modify the data structure (or
300 wishes to protect itself from others who might). This scheme uses
301 Storable.pm to perform the copy.
302
303 Warning! The memory-based caching schemes compare the timestamp on the
304 file to the time when it was last parsed. If the file is stored on an
305 NFS filesystem (or other network share) and the clock on the file
306 server is not exactly synchronised with the clock where your script is
307 run, updates to the source XML file may appear to be ignored.
308
309 ContentKey => 'keyname' # in+out - seldom used
310 When text content is parsed to a hash value, this option let's you
311 specify a name for the hash key to override the default 'content'. So
312 for example:
313
314 XMLin('<opt one="1">Text</opt>', ContentKey => 'text')
315
316 will parse to:
317
318 { 'one' => 1, 'text' => 'Text' }
319
320 instead of:
321
322 { 'one' => 1, 'content' => 'Text' }
323
324 "XMLout()" will also honour the value of this option when converting a
325 hashref to XML.
326
327 You can also prefix your selected key name with a '-' character to have
328 "XMLin()" try a little harder to eliminate unnecessary 'content' keys
329 after array folding. For example:
330
331 XMLin(
332 '<opt><item name="one">First</item><item name="two">Second</item></opt>',
333 KeyAttr => {item => 'name'},
334 ForceArray => [ 'item' ],
335 ContentKey => '-content'
336 )
337
338 will parse to:
339
340 {
341 'item' => {
342 'one' => 'First'
343 'two' => 'Second'
344 }
345 }
346
347 rather than this (without the '-'):
348
349 {
350 'item' => {
351 'one' => { 'content' => 'First' }
352 'two' => { 'content' => 'Second' }
353 }
354 }
355
356 DataHandler => code_ref # in - SAX only
357 When you use an XML::Simple object as a SAX handler, it will return a
358 'simple tree' data structure in the same format as "XMLin()" would
359 return. If this option is set (to a subroutine reference), then when
360 the tree is built the subroutine will be called and passed two
361 arguments: a reference to the XML::Simple object and a reference to the
362 data tree. The return value from the subroutine will be returned to
363 the SAX driver. (See "SAX SUPPORT" for more details).
364
365 ForceArray => 1 # in - important
366 This option should be set to '1' to force nested elements to be
367 represented as arrays even when there is only one. Eg, with ForceArray
368 enabled, this XML:
369
370 <opt>
371 <name>value</name>
372 </opt>
373
374 would parse to this:
375
376 {
377 'name' => [
378 'value'
379 ]
380 }
381
382 instead of this (the default):
383
384 {
385 'name' => 'value'
386 }
387
388 This option is especially useful if the data structure is likely to be
389 written back out as XML and the default behaviour of rolling single
390 nested elements up into attributes is not desirable.
391
392 If you are using the array folding feature, you should almost certainly
393 enable this option. If you do not, single nested elements will not be
394 parsed to arrays and therefore will not be candidates for folding to a
395 hash. (Given that the default value of 'KeyAttr' enables array
396 folding, the default value of this option should probably also have
397 been enabled too - sorry).
398
399 ForceArray => [ names ] # in - important
400 This alternative (and preferred) form of the 'ForceArray' option allows
401 you to specify a list of element names which should always be forced
402 into an array representation, rather than the 'all or nothing' approach
403 above.
404
405 It is also possible (since version 2.05) to include compiled regular
406 expressions in the list - any element names which match the pattern
407 will be forced to arrays. If the list contains only a single regex,
408 then it is not necessary to enclose it in an arrayref. Eg:
409
410 ForceArray => qr/_list$/
411
412 ForceContent => 1 # in - seldom used
413 When "XMLin()" parses elements which have text content as well as
414 attributes, the text content must be represented as a hash value rather
415 than a simple scalar. This option allows you to force text content to
416 always parse to a hash value even when there are no attributes. So for
417 example:
418
419 XMLin('<opt><x>text1</x><y a="2">text2</y></opt>', ForceContent => 1)
420
421 will parse to:
422
423 {
424 'x' => { 'content' => 'text1' },
425 'y' => { 'a' => 2, 'content' => 'text2' }
426 }
427
428 instead of:
429
430 {
431 'x' => 'text1',
432 'y' => { 'a' => 2, 'content' => 'text2' }
433 }
434
435 GroupTags => { grouping tag => grouped tag } # in+out - handy
436 You can use this option to eliminate extra levels of indirection in
437 your Perl data structure. For example this XML:
438
439 <opt>
440 <searchpath>
441 <dir>/usr/bin</dir>
442 <dir>/usr/local/bin</dir>
443 <dir>/usr/X11/bin</dir>
444 </searchpath>
445 </opt>
446
447 Would normally be read into a structure like this:
448
449 {
450 searchpath => {
451 dir => [ '/usr/bin', '/usr/local/bin', '/usr/X11/bin' ]
452 }
453 }
454
455 But when read in with the appropriate value for 'GroupTags':
456
457 my $opt = XMLin($xml, GroupTags => { searchpath => 'dir' });
458
459 It will return this simpler structure:
460
461 {
462 searchpath => [ '/usr/bin', '/usr/local/bin', '/usr/X11/bin' ]
463 }
464
465 The grouping element ("<searchpath>" in the example) must not contain
466 any attributes or elements other than the grouped element.
467
468 You can specify multiple 'grouping element' to 'grouped element'
469 mappings in the same hashref. If this option is combined with
470 "KeyAttr", the array folding will occur first and then the grouped
471 element names will be eliminated.
472
473 "XMLout" will also use the grouptag mappings to re-introduce the tags
474 around the grouped elements. Beware though that this will occur in all
475 places that the 'grouping tag' name occurs - you probably don't want to
476 use the same name for elements as well as attributes.
477
478 Handler => object_ref # out - SAX only
479 Use the 'Handler' option to have "XMLout()" generate SAX events rather
480 than returning a string of XML. For more details see "SAX SUPPORT"
481 below.
482
483 Note: the current implementation of this option generates a string of
484 XML and uses a SAX parser to translate it into SAX events. The normal
485 encoding rules apply here - your data must be UTF8 encoded unless you
486 specify an alternative encoding via the 'XMLDecl' option; and by the
487 time the data reaches the handler object, it will be in UTF8 form
488 regardless of the encoding you supply. A future implementation of this
489 option may generate the events directly.
490
491 KeepRoot => 1 # in+out - handy
492 In its attempt to return a data structure free of superfluous detail
493 and unnecessary levels of indirection, "XMLin()" normally discards the
494 root element name. Setting the 'KeepRoot' option to '1' will cause the
495 root element name to be retained. So after executing this code:
496
497 $config = XMLin('<config tempdir="/tmp" />', KeepRoot => 1)
498
499 You'll be able to reference the tempdir as
500 "$config->{config}->{tempdir}" instead of the default
501 "$config->{tempdir}".
502
503 Similarly, setting the 'KeepRoot' option to '1' will tell "XMLout()"
504 that the data structure already contains a root element name and it is
505 not necessary to add another.
506
507 KeyAttr => [ list ] # in+out - important
508 This option controls the 'array folding' feature which translates
509 nested elements from an array to a hash. It also controls the
510 'unfolding' of hashes to arrays.
511
512 For example, this XML:
513
514 <opt>
515 <user login="grep" fullname="Gary R Epstein" />
516 <user login="stty" fullname="Simon T Tyson" />
517 </opt>
518
519 would, by default, parse to this:
520
521 {
522 'user' => [
523 {
524 'login' => 'grep',
525 'fullname' => 'Gary R Epstein'
526 },
527 {
528 'login' => 'stty',
529 'fullname' => 'Simon T Tyson'
530 }
531 ]
532 }
533
534 If the option 'KeyAttr => "login"' were used to specify that the
535 'login' attribute is a key, the same XML would parse to:
536
537 {
538 'user' => {
539 'stty' => {
540 'fullname' => 'Simon T Tyson'
541 },
542 'grep' => {
543 'fullname' => 'Gary R Epstein'
544 }
545 }
546 }
547
548 The key attribute names should be supplied in an arrayref if there is
549 more than one. "XMLin()" will attempt to match attribute names in the
550 order supplied. "XMLout()" will use the first attribute name supplied
551 when 'unfolding' a hash into an array.
552
553 Note 1: The default value for 'KeyAttr' is ['name', 'key', 'id']. If
554 you do not want folding on input or unfolding on output you must
555 setting this option to an empty list to disable the feature.
556
557 Note 2: If you wish to use this option, you should also enable the
558 "ForceArray" option. Without 'ForceArray', a single nested element
559 will be rolled up into a scalar rather than an array and therefore will
560 not be folded (since only arrays get folded).
561
562 KeyAttr => { list } # in+out - important
563 This alternative (and preferred) method of specifiying the key
564 attributes allows more fine grained control over which elements are
565 folded and on which attributes. For example the option 'KeyAttr => {
566 package => 'id' } will cause any package elements to be folded on the
567 'id' attribute. No other elements which have an 'id' attribute will be
568 folded at all.
569
570 Note: "XMLin()" will generate a warning (or a fatal error in "STRICT
571 MODE") if this syntax is used and an element which does not have the
572 specified key attribute is encountered (eg: a 'package' element without
573 an 'id' attribute, to use the example above). Warnings will only be
574 generated if -w is in force.
575
576 Two further variations are made possible by prefixing a '+' or a '-'
577 character to the attribute name:
578
579 The option 'KeyAttr => { user => "+login" }' will cause this XML:
580
581 <opt>
582 <user login="grep" fullname="Gary R Epstein" />
583 <user login="stty" fullname="Simon T Tyson" />
584 </opt>
585
586 to parse to this data structure:
587
588 {
589 'user' => {
590 'stty' => {
591 'fullname' => 'Simon T Tyson',
592 'login' => 'stty'
593 },
594 'grep' => {
595 'fullname' => 'Gary R Epstein',
596 'login' => 'grep'
597 }
598 }
599 }
600
601 The '+' indicates that the value of the key attribute should be copied
602 rather than moved to the folded hash key.
603
604 A '-' prefix would produce this result:
605
606 {
607 'user' => {
608 'stty' => {
609 'fullname' => 'Simon T Tyson',
610 '-login' => 'stty'
611 },
612 'grep' => {
613 'fullname' => 'Gary R Epstein',
614 '-login' => 'grep'
615 }
616 }
617 }
618
619 As described earlier, "XMLout" will ignore hash keys starting with a
620 '-'.
621
622 NoAttr => 1 # in+out - handy
623 When used with "XMLout()", the generated XML will contain no
624 attributes. All hash key/values will be represented as nested elements
625 instead.
626
627 When used with "XMLin()", any attributes in the XML will be ignored.
628
629 NoEscape => 1 # out - seldom used
630 By default, "XMLout()" will translate the characters '<', '>', '&' and
631 '"' to '<', '>', '&' and '"' respectively. Use this
632 option to suppress escaping (presumably because you've already escaped
633 the data in some more sophisticated manner).
634
635 NoIndent => 1 # out - seldom used
636 Set this option to 1 to disable "XMLout()"'s default 'pretty printing'
637 mode. With this option enabled, the XML output will all be on one line
638 (unless there are newlines in the data) - this may be easier for
639 downstream processing.
640
641 NoSort => 1 # out - seldom used
642 Newer versions of XML::Simple sort elements and attributes
643 alphabetically (*), by default. Enable this option to suppress the
644 sorting - possibly for backwards compatibility.
645
646 * Actually, sorting is alphabetical but 'key' attribute or element
647 names (as in 'KeyAttr') sort first. Also, when a hash of hashes is
648 'unfolded', the elements are sorted alphabetically by the value of the
649 key field.
650
651 NormaliseSpace => 0 | 1 | 2 # in - handy
652 This option controls how whitespace in text content is handled.
653 Recognised values for the option are:
654
655 · 0 = (default) whitespace is passed through unaltered (except of
656 course for the normalisation of whitespace in attribute values
657 which is mandated by the XML recommendation)
658
659 · 1 = whitespace is normalised in any value used as a hash key
660 (normalising means removing leading and trailing whitespace and
661 collapsing sequences of whitespace characters to a single space)
662
663 · 2 = whitespace is normalised in all text content
664
665 Note: you can spell this option with a 'z' if that is more natural for
666 you.
667
668 NSExpand => 1 # in+out handy - SAX only
669 This option controls namespace expansion - the translation of element
670 and attribute names of the form 'prefix:name' to '{uri}name'. For
671 example the element name 'xsl:template' might be expanded to:
672 '{http://www.w3.org/1999/XSL/Transform}template'.
673
674 By default, "XMLin()" will return element names and attribute names
675 exactly as they appear in the XML. Setting this option to 1 will cause
676 all element and attribute names to be expanded to include their
677 namespace prefix.
678
679 Note: You must be using a SAX parser for this option to work (ie: it
680 does not work with XML::Parser).
681
682 This option also controls whether "XMLout()" performs the reverse
683 translation from '{uri}name' back to 'prefix:name'. The default is no
684 translation. If your data contains expanded names, you should set this
685 option to 1 otherwise "XMLout" will emit XML which is not well formed.
686
687 Note: You must have the XML::NamespaceSupport module installed if you
688 want "XMLout()" to translate URIs back to prefixes.
689
690 NumericEscape => 0 | 1 | 2 # out - handy
691 Use this option to have 'high' (non-ASCII) characters in your Perl data
692 structure converted to numeric entities (eg: €) in the XML
693 output. Three levels are possible:
694
695 0 - default: no numeric escaping (OK if you're writing out UTF8)
696
697 1 - only characters above 0xFF are escaped (ie: characters in the
698 0x80-FF range are not escaped), possibly useful with ISO8859-1 output
699
700 2 - all characters above 0x7F are escaped (good for plain ASCII output)
701
702 OutputFile => <file specifier> # out - handy
703 The default behaviour of "XMLout()" is to return the XML as a string.
704 If you wish to write the XML to a file, simply supply the filename
705 using the 'OutputFile' option.
706
707 This option also accepts an IO handle object - especially useful in
708 Perl 5.8.0 and later for output using an encoding other than UTF-8, eg:
709
710 open my $fh, '>:encoding(iso-8859-1)', $path or die "open($path): $!";
711 XMLout($ref, OutputFile => $fh);
712
713 Note, XML::Simple does not require that the object you pass in to the
714 OutputFile option inherits from IO::Handle - it simply assumes the
715 object supports a "print" method.
716
717 ParserOpts => [ XML::Parser Options ] # in - don't use this
718 Note: This option is now officially deprecated. If you find it useful,
719 email the author with an example of what you use it for. Do not use
720 this option to set the ProtocolEncoding, that's just plain wrong - fix
721 the XML.
722
723 This option allows you to pass parameters to the constructor of the
724 underlying XML::Parser object (which of course assumes you're not using
725 SAX).
726
727 RootName => 'string' # out - handy
728 By default, when "XMLout()" generates XML, the root element will be
729 named 'opt'. This option allows you to specify an alternative name.
730
731 Specifying either undef or the empty string for the RootName option
732 will produce XML with no root elements. In most cases the resulting
733 XML fragment will not be 'well formed' and therefore could not be read
734 back in by "XMLin()". Nevertheless, the option has been found to be
735 useful in certain circumstances.
736
737 SearchPath => [ list ] # in - handy
738 If you pass "XMLin()" a filename, but the filename include no directory
739 component, you can use this option to specify which directories should
740 be searched to locate the file. You might use this option to search
741 first in the user's home directory, then in a global directory such as
742 /etc.
743
744 If a filename is provided to "XMLin()" but SearchPath is not defined,
745 the file is assumed to be in the current directory.
746
747 If the first parameter to "XMLin()" is undefined, the default
748 SearchPath will contain only the directory in which the script itself
749 is located. Otherwise the default SearchPath will be empty.
750
751 SuppressEmpty => 1 | '' | undef # in+out - handy
752 This option controls what "XMLin()" should do with empty elements (no
753 attributes and no content). The default behaviour is to represent them
754 as empty hashes. Setting this option to a true value (eg: 1) will
755 cause empty elements to be skipped altogether. Setting the option to
756 'undef' or the empty string will cause empty elements to be represented
757 as the undefined value or the empty string respectively. The latter
758 two alternatives are a little easier to test for in your code than a
759 hash with no keys.
760
761 The option also controls what "XMLout()" does with undefined values.
762 Setting the option to undef causes undefined values to be output as
763 empty elements (rather than empty attributes), it also suppresses the
764 generation of warnings about undefined values. Setting the option to a
765 true value (eg: 1) causes undefined values to be skipped altogether on
766 output.
767
768 ValueAttr => [ names ] # in - handy
769 Use this option to deal elements which always have a single attribute
770 and no content. Eg:
771
772 <opt>
773 <colour value="red" />
774 <size value="XXL" />
775 </opt>
776
777 Setting "ValueAttr => [ 'value' ]" will cause the above XML to parse
778 to:
779
780 {
781 colour => 'red',
782 size => 'XXL'
783 }
784
785 instead of this (the default):
786
787 {
788 colour => { value => 'red' },
789 size => { value => 'XXL' }
790 }
791
792 Note: This form of the ValueAttr option is not compatible with
793 "XMLout()" - since the attribute name is discarded at parse time, the
794 original XML cannot be reconstructed.
795
796 ValueAttr => { element => attribute, ... } # in+out - handy
797 This (preferred) form of the ValueAttr option requires you to specify
798 both the element and the attribute names. This is not only safer, it
799 also allows the original XML to be reconstructed by "XMLout()".
800
801 Note: You probably don't want to use this option and the NoAttr option
802 at the same time.
803
804 Variables => { name => value } # in - handy
805 This option allows variables in the XML to be expanded when the file is
806 read. (there is no facility for putting the variable names back if you
807 regenerate XML using "XMLout").
808
809 A 'variable' is any text of the form "${name}" which occurs in an
810 attribute value or in the text content of an element. If 'name'
811 matches a key in the supplied hashref, "${name}" will be replaced with
812 the corresponding value from the hashref. If no matching key is found,
813 the variable will not be replaced. Names must match the regex:
814 "[\w.]+" (ie: only 'word' characters and dots are allowed).
815
816 VarAttr => 'attr_name' # in - handy
817 In addition to the variables defined using "Variables", this option
818 allows variables to be defined in the XML. A variable definition
819 consists of an element with an attribute called 'attr_name' (the value
820 of the "VarAttr" option). The value of the attribute will be used as
821 the variable name and the text content of the element will be used as
822 the value. A variable defined in this way will override a variable
823 defined using the "Variables" option. For example:
824
825 XMLin( '<opt>
826 <dir name="prefix">/usr/local/apache</dir>
827 <dir name="exec_prefix">${prefix}</dir>
828 <dir name="bindir">${exec_prefix}/bin</dir>
829 </opt>',
830 VarAttr => 'name', ContentKey => '-content'
831 );
832
833 produces the following data structure:
834
835 {
836 dir => {
837 prefix => '/usr/local/apache',
838 exec_prefix => '/usr/local/apache',
839 bindir => '/usr/local/apache/bin',
840 }
841 }
842
843 XMLDecl => 1 or XMLDecl => 'string' # out - handy
844 If you want the output from "XMLout()" to start with the optional XML
845 declaration, simply set the option to '1'. The default XML declaration
846 is:
847
848 <?xml version='1.0' standalone='yes'?>
849
850 If you want some other string (for example to declare an encoding
851 value), set the value of this option to the complete string you
852 require.
853
855 The procedural interface is both simple and convenient however there
856 are a couple of reasons why you might prefer to use the object oriented
857 (OO) interface:
858
859 · to define a set of default values which should be used on all
860 subsequent calls to "XMLin()" or "XMLout()"
861
862 · to override methods in XML::Simple to provide customised behaviour
863
864 The default values for the options described above are unlikely to suit
865 everyone. The OO interface allows you to effectively override
866 XML::Simple's defaults with your preferred values. It works like this:
867
868 First create an XML::Simple parser object with your preferred defaults:
869
870 my $xs = XML::Simple->new(ForceArray => 1, KeepRoot => 1);
871
872 then call "XMLin()" or "XMLout()" as a method of that object:
873
874 my $ref = $xs->XMLin($xml);
875 my $xml = $xs->XMLout($ref);
876
877 You can also specify options when you make the method calls and these
878 values will be merged with the values specified when the object was
879 created. Values specified in a method call take precedence.
880
881 Note: when called as methods, the "XMLin()" and "XMLout()" routines may
882 be called as "xml_in()" or "xml_out()". The method names are aliased
883 so the only difference is the aesthetics.
884
885 Parsing Methods
886 You can explicitly call one of the following methods rather than rely
887 on the "xml_in()" method automatically determining whether the target
888 to be parsed is a string, a file or a filehandle:
889
890 parse_string(text)
891 Works exactly like the "xml_in()" method but assumes the first
892 argument is a string of XML (or a reference to a scalar containing
893 a string of XML).
894
895 parse_file(filename)
896 Works exactly like the "xml_in()" method but assumes the first
897 argument is the name of a file containing XML.
898
899 parse_fh(file_handle)
900 Works exactly like the "xml_in()" method but assumes the first
901 argument is a filehandle which can be read to get XML.
902
903 Hook Methods
904 You can make your own class which inherits from XML::Simple and
905 overrides certain behaviours. The following methods may provide useful
906 'hooks' upon which to hang your modified behaviour. You may find other
907 undocumented methods by examining the source, but those may be subject
908 to change in future releases.
909
910 handle_options(direction, name => value ...)
911 This method will be called when one of the parsing methods or the
912 "XMLout()" method is called. The initial argument will be a string
913 (either 'in' or 'out') and the remaining arguments will be name
914 value pairs.
915
916 default_config_file()
917 Calculates and returns the name of the file which should be parsed
918 if no filename is passed to "XMLin()" (default: "$0.xml").
919
920 build_simple_tree(filename, string)
921 Called from "XMLin()" or any of the parsing methods. Takes either
922 a file name as the first argument or "undef" followed by a 'string'
923 as the second argument. Returns a simple tree data structure. You
924 could override this method to apply your own transformations before
925 the data structure is returned to the caller.
926
927 new_hashref()
928 When the 'simple tree' data structure is being built, this method
929 will be called to create any required anonymous hashrefs.
930
931 sorted_keys(name, hashref)
932 Called when "XMLout()" is translating a hashref to XML. This
933 routine returns a list of hash keys in the order that the
934 corresponding attributes/elements should appear in the output.
935
936 escape_value(string)
937 Called from "XMLout()", takes a string and returns a copy of the
938 string with XML character escaping rules applied.
939
940 numeric_escape(string)
941 Called from "escape_value()", to handle non-ASCII characters
942 (depending on the value of the NumericEscape option).
943
944 copy_hash(hashref, extra_key => value, ...)
945 Called from "XMLout()", when 'unfolding' a hash of hashes into an
946 array of hashes. You might wish to override this method if you're
947 using tied hashes and don't want them to get untied.
948
949 Cache Methods
950 XML::Simple implements three caching schemes ('storable', 'memshare'
951 and 'memcopy'). You can implement a custom caching scheme by
952 implementing two methods - one for reading from the cache and one for
953 writing to it.
954
955 For example, you might implement a new 'dbm' scheme that stores cached
956 data structures using the MLDBM module. First, you would add a
957 "cache_read_dbm()" method which accepted a filename for use as a lookup
958 key and returned a data structure on success, or undef on failure.
959 Then, you would implement a "cache_read_dbm()" method which accepted a
960 data structure and a filename.
961
962 You would use this caching scheme by specifying the option:
963
964 Cache => [ 'dbm' ]
965
967 If you import the XML::Simple routines like this:
968
969 use XML::Simple qw(:strict);
970
971 the following common mistakes will be detected and treated as fatal
972 errors
973
974 · Failing to explicitly set the "KeyAttr" option - if you can't be
975 bothered reading about this option, turn it off with: KeyAttr => [
976 ]
977
978 · Failing to explicitly set the "ForceArray" option - if you can't be
979 bothered reading about this option, set it to the safest mode with:
980 ForceArray => 1
981
982 · Setting ForceArray to an array, but failing to list all the
983 elements from the KeyAttr hash.
984
985 · Data error - KeyAttr is set to say { part => 'partnum' } but the
986 XML contains one or more <part> elements without a 'partnum'
987 attribute (or nested element). Note: if strict mode is not set but
988 -w is, this condition triggers a warning.
989
990 · Data error - as above, but non-unique values are present in the key
991 attribute (eg: more than one <part> element with the same partnum).
992 This will also trigger a warning if strict mode is not enabled.
993
994 · Data error - as above, but value of key attribute (eg: partnum) is
995 not a scalar string (due to nested elements etc). This will also
996 trigger a warning if strict mode is not enabled.
997
999 From version 1.08_01, XML::Simple includes support for SAX (the Simple
1000 API for XML) - specifically SAX2.
1001
1002 In a typical SAX application, an XML parser (or SAX 'driver') module
1003 generates SAX events (start of element, character data, end of element,
1004 etc) as it parses an XML document and a 'handler' module processes the
1005 events to extract the required data. This simple model allows for some
1006 interesting and powerful possibilities:
1007
1008 · Applications written to the SAX API can extract data from huge XML
1009 documents without the memory overheads of a DOM or tree API.
1010
1011 · The SAX API allows for plug and play interchange of parser modules
1012 without having to change your code to fit a new module's API. A
1013 number of SAX parsers are available with capabilities ranging from
1014 extreme portability to blazing performance.
1015
1016 · A SAX 'filter' module can implement both a handler interface for
1017 receiving data and a generator interface for passing modified data
1018 on to a downstream handler. Filters can be chained together in
1019 'pipelines'.
1020
1021 · One filter module might split a data stream to direct data to two
1022 or more downstream handlers.
1023
1024 · Generating SAX events is not the exclusive preserve of XML parsing
1025 modules. For example, a module might extract data from a
1026 relational database using DBI and pass it on to a SAX pipeline for
1027 filtering and formatting.
1028
1029 XML::Simple can operate at either end of a SAX pipeline. For example,
1030 you can take a data structure in the form of a hashref and pass it into
1031 a SAX pipeline using the 'Handler' option on "XMLout()":
1032
1033 use XML::Simple;
1034 use Some::SAX::Filter;
1035 use XML::SAX::Writer;
1036
1037 my $ref = {
1038 .... # your data here
1039 };
1040
1041 my $writer = XML::SAX::Writer->new();
1042 my $filter = Some::SAX::Filter->new(Handler => $writer);
1043 my $simple = XML::Simple->new(Handler => $filter);
1044 $simple->XMLout($ref);
1045
1046 You can also put XML::Simple at the opposite end of the pipeline to
1047 take advantage of the simple 'tree' data structure once the relevant
1048 data has been isolated through filtering:
1049
1050 use XML::SAX;
1051 use Some::SAX::Filter;
1052 use XML::Simple;
1053
1054 my $simple = XML::Simple->new(ForceArray => 1, KeyAttr => ['partnum']);
1055 my $filter = Some::SAX::Filter->new(Handler => $simple);
1056 my $parser = XML::SAX::ParserFactory->parser(Handler => $filter);
1057
1058 my $ref = $parser->parse_uri('some_huge_file.xml');
1059
1060 print $ref->{part}->{'555-1234'};
1061
1062 You can build a filter by using an XML::Simple object as a handler and
1063 setting its DataHandler option to point to a routine which takes the
1064 resulting tree, modifies it and sends it off as SAX events to a
1065 downstream handler:
1066
1067 my $writer = XML::SAX::Writer->new();
1068 my $filter = XML::Simple->new(
1069 DataHandler => sub {
1070 my $simple = shift;
1071 my $data = shift;
1072
1073 # Modify $data here
1074
1075 $simple->XMLout($data, Handler => $writer);
1076 }
1077 );
1078 my $parser = XML::SAX::ParserFactory->parser(Handler => $filter);
1079
1080 $parser->parse_uri($filename);
1081
1082 Note: In this last example, the 'Handler' option was specified in the
1083 call to "XMLout()" but it could also have been specified in the
1084 constructor.
1085
1087 If you don't care which parser module XML::Simple uses then skip this
1088 section entirely (it looks more complicated than it really is).
1089
1090 XML::Simple will default to using a SAX parser if one is available or
1091 XML::Parser if SAX is not available.
1092
1093 You can dictate which parser module is used by setting either the
1094 environment variable 'XML_SIMPLE_PREFERRED_PARSER' or the package
1095 variable $XML::Simple::PREFERRED_PARSER to contain the module name.
1096 The following rules are used:
1097
1098 · The package variable takes precedence over the environment variable
1099 if both are defined. To force XML::Simple to ignore the
1100 environment settings and use its default rules, you can set the
1101 package variable to an empty string.
1102
1103 · If the 'preferred parser' is set to the string 'XML::Parser', then
1104 XML::Parser will be used (or "XMLin()" will die if XML::Parser is
1105 not installed).
1106
1107 · If the 'preferred parser' is set to some other value, then it is
1108 assumed to be the name of a SAX parser module and is passed to
1109 XML::SAX::ParserFactory. If XML::SAX is not installed, or the
1110 requested parser module is not installed, then "XMLin()" will die.
1111
1112 · If the 'preferred parser' is not defined at all (the normal default
1113 state), an attempt will be made to load XML::SAX. If XML::SAX is
1114 installed, then a parser module will be selected according to
1115 XML::SAX::ParserFactory's normal rules (which typically means the
1116 last SAX parser installed).
1117
1118 · if the 'preferred parser' is not defined and XML::SAX is not
1119 installed, then XML::Parser will be used. "XMLin()" will die if
1120 XML::Parser is not installed.
1121
1122 Note: The XML::SAX distribution includes an XML parser written entirely
1123 in Perl. It is very portable but it is not very fast. You should
1124 consider installing XML::LibXML or XML::SAX::Expat if they are
1125 available for your platform.
1126
1128 The XML standard is very clear on the issue of non-compliant documents.
1129 An error in parsing any single element (for example a missing end tag)
1130 must cause the whole document to be rejected. XML::Simple will die
1131 with an appropriate message if it encounters a parsing error.
1132
1133 If dying is not appropriate for your application, you should arrange to
1134 call "XMLin()" in an eval block and look for errors in $@. eg:
1135
1136 my $config = eval { XMLin() };
1137 PopUpMessage($@) if($@);
1138
1139 Note, there is a common misconception that use of eval will
1140 significantly slow down a script. While that may be true when the code
1141 being eval'd is in a string, it is not true of code like the sample
1142 above.
1143
1145 When "XMLin()" reads the following very simple piece of XML:
1146
1147 <opt username="testuser" password="frodo"></opt>
1148
1149 it returns the following data structure:
1150
1151 {
1152 'username' => 'testuser',
1153 'password' => 'frodo'
1154 }
1155
1156 The identical result could have been produced with this alternative
1157 XML:
1158
1159 <opt username="testuser" password="frodo" />
1160
1161 Or this (although see 'ForceArray' option for variations):
1162
1163 <opt>
1164 <username>testuser</username>
1165 <password>frodo</password>
1166 </opt>
1167
1168 Repeated nested elements are represented as anonymous arrays:
1169
1170 <opt>
1171 <person firstname="Joe" lastname="Smith">
1172 <email>joe@smith.com</email>
1173 <email>jsmith@yahoo.com</email>
1174 </person>
1175 <person firstname="Bob" lastname="Smith">
1176 <email>bob@smith.com</email>
1177 </person>
1178 </opt>
1179
1180 {
1181 'person' => [
1182 {
1183 'email' => [
1184 'joe@smith.com',
1185 'jsmith@yahoo.com'
1186 ],
1187 'firstname' => 'Joe',
1188 'lastname' => 'Smith'
1189 },
1190 {
1191 'email' => 'bob@smith.com',
1192 'firstname' => 'Bob',
1193 'lastname' => 'Smith'
1194 }
1195 ]
1196 }
1197
1198 Nested elements with a recognised key attribute are transformed
1199 (folded) from an array into a hash keyed on the value of that attribute
1200 (see the "KeyAttr" option):
1201
1202 <opt>
1203 <person key="jsmith" firstname="Joe" lastname="Smith" />
1204 <person key="tsmith" firstname="Tom" lastname="Smith" />
1205 <person key="jbloggs" firstname="Joe" lastname="Bloggs" />
1206 </opt>
1207
1208 {
1209 'person' => {
1210 'jbloggs' => {
1211 'firstname' => 'Joe',
1212 'lastname' => 'Bloggs'
1213 },
1214 'tsmith' => {
1215 'firstname' => 'Tom',
1216 'lastname' => 'Smith'
1217 },
1218 'jsmith' => {
1219 'firstname' => 'Joe',
1220 'lastname' => 'Smith'
1221 }
1222 }
1223 }
1224
1225 The <anon> tag can be used to form anonymous arrays:
1226
1227 <opt>
1228 <head><anon>Col 1</anon><anon>Col 2</anon><anon>Col 3</anon></head>
1229 <data><anon>R1C1</anon><anon>R1C2</anon><anon>R1C3</anon></data>
1230 <data><anon>R2C1</anon><anon>R2C2</anon><anon>R2C3</anon></data>
1231 <data><anon>R3C1</anon><anon>R3C2</anon><anon>R3C3</anon></data>
1232 </opt>
1233
1234 {
1235 'head' => [
1236 [ 'Col 1', 'Col 2', 'Col 3' ]
1237 ],
1238 'data' => [
1239 [ 'R1C1', 'R1C2', 'R1C3' ],
1240 [ 'R2C1', 'R2C2', 'R2C3' ],
1241 [ 'R3C1', 'R3C2', 'R3C3' ]
1242 ]
1243 }
1244
1245 Anonymous arrays can be nested to arbirtrary levels and as a special
1246 case, if the surrounding tags for an XML document contain only an
1247 anonymous array the arrayref will be returned directly rather than the
1248 usual hashref:
1249
1250 <opt>
1251 <anon><anon>Col 1</anon><anon>Col 2</anon></anon>
1252 <anon><anon>R1C1</anon><anon>R1C2</anon></anon>
1253 <anon><anon>R2C1</anon><anon>R2C2</anon></anon>
1254 </opt>
1255
1256 [
1257 [ 'Col 1', 'Col 2' ],
1258 [ 'R1C1', 'R1C2' ],
1259 [ 'R2C1', 'R2C2' ]
1260 ]
1261
1262 Elements which only contain text content will simply be represented as
1263 a scalar. Where an element has both attributes and text content, the
1264 element will be represented as a hashref with the text content in the
1265 'content' key (see the "ContentKey" option):
1266
1267 <opt>
1268 <one>first</one>
1269 <two attr="value">second</two>
1270 </opt>
1271
1272 {
1273 'one' => 'first',
1274 'two' => { 'attr' => 'value', 'content' => 'second' }
1275 }
1276
1277 Mixed content (elements which contain both text content and nested
1278 elements) will be not be represented in a useful way - element order
1279 and significant whitespace will be lost. If you need to work with
1280 mixed content, then XML::Simple is not the right tool for your job -
1281 check out the next section.
1282
1284 XML::Simple is able to present a simple API because it makes some
1285 assumptions on your behalf. These include:
1286
1287 · You're not interested in text content consisting only of whitespace
1288
1289 · You don't mind that when things get slurped into a hash the order
1290 is lost
1291
1292 · You don't want fine-grained control of the formatting of generated
1293 XML
1294
1295 · You would never use a hash key that was not a legal XML element
1296 name
1297
1298 · You don't need help converting between different encodings
1299
1300 In a serious XML project, you'll probably outgrow these assumptions
1301 fairly quickly. This section of the document used to offer some advice
1302 on chosing a more powerful option. That advice has now grown into the
1303 'Perl-XML FAQ' document which you can find at:
1304 <http://perl-xml.sourceforge.net/faq/>
1305
1306 The advice in the FAQ boils down to a quick explanation of tree versus
1307 event based parsers and then recommends:
1308
1309 For event based parsing, use SAX (do not set out to write any new code
1310 for XML::Parser's handler API - it is obselete).
1311
1312 For tree-based parsing, you could choose between the 'Perlish' approach
1313 of XML::Twig and more standards based DOM implementations - preferably
1314 one with XPath support.
1315
1317 XML::Simple requires either XML::Parser or XML::SAX.
1318
1319 To generate documents with namespaces, XML::NamespaceSupport is
1320 required.
1321
1322 The optional caching functions require Storable.
1323
1324 Answers to Frequently Asked Questions about XML::Simple are bundled
1325 with this distribution as: XML::Simple::FAQ
1326
1328 Copyright 1999-2004 Grant McLean <grantm@cpan.org>
1329
1330 This library is free software; you can redistribute it and/or modify it
1331 under the same terms as Perl itself.
1332
1333
1334
1335perl v5.10.1 2007-08-15 XML::Simple(3)