1XML::Simple(3)        User Contributed Perl Documentation       XML::Simple(3)
2
3
4

NAME

6       XML::Simple - Easy API to maintain XML (esp config files)
7

SYNOPSIS

9           use XML::Simple;
10
11           my $ref = XMLin([<xml file or string>] [, <options>]);
12
13           my $xml = XMLout($hashref [, <options>]);
14
15       Or the object oriented way:
16
17           require XML::Simple;
18
19           my $xs = XML::Simple->new(options);
20
21           my $ref = $xs->XMLin([<xml file or string>] [, <options>]);
22
23           my $xml = $xs->XMLout($hashref [, <options>]);
24
25       (or see "SAX SUPPORT" for 'the SAX way').
26
27       To catch common errors:
28
29           use XML::Simple qw(:strict);
30
31       (see "STRICT MODE" for more details).
32

QUICK START

34       Say you have a script called foo and a file of configuration options
35       called foo.xml containing this:
36
37         <config logdir="/var/log/foo/" debugfile="/tmp/foo.debug">
38           <server name="sahara" osname="solaris" osversion="2.6">
39             <address>10.0.0.101</address>
40             <address>10.0.1.101</address>
41           </server>
42           <server name="gobi" osname="irix" osversion="6.5">
43             <address>10.0.0.102</address>
44           </server>
45           <server name="kalahari" osname="linux" osversion="2.0.34">
46             <address>10.0.0.103</address>
47             <address>10.0.1.103</address>
48           </server>
49         </config>
50
51       The following lines of code in foo:
52
53         use XML::Simple;
54
55         my $config = XMLin();
56
57       will 'slurp' the configuration options into the hashref $config
58       (because no arguments are passed to "XMLin()" the name and location of
59       the XML file will be inferred from name and location of the script).
60       You can dump out the contents of the hashref using Data::Dumper:
61
62         use Data::Dumper;
63
64         print Dumper($config);
65
66       which will produce something like this (formatting has been adjusted
67       for brevity):
68
69         {
70             'logdir'        => '/var/log/foo/',
71             'debugfile'     => '/tmp/foo.debug',
72             'server'        => {
73                 'sahara'        => {
74                     'osversion'     => '2.6',
75                     'osname'        => 'solaris',
76                     'address'       => [ '10.0.0.101', '10.0.1.101' ]
77                 },
78                 'gobi'          => {
79                     'osversion'     => '6.5',
80                     'osname'        => 'irix',
81                     'address'       => '10.0.0.102'
82                 },
83                 'kalahari'      => {
84                     'osversion'     => '2.0.34',
85                     'osname'        => 'linux',
86                     'address'       => [ '10.0.0.103', '10.0.1.103' ]
87                 }
88             }
89         }
90
91       Your script could then access the name of the log directory like this:
92
93         print $config->{logdir};
94
95       similarly, the second address on the server 'kalahari' could be refer‐
96       enced as:
97
98         print $config->{server}->{kalahari}->{address}->[1];
99
100       What could be simpler?  (Rhetorical).
101
102       For simple requirements, that's really all there is to it.  If you want
103       to store your XML in a different directory or file, or pass it in as a
104       string or even pass it in via some derivative of an IO::Handle, you'll
105       need to check out "OPTIONS".  If you want to turn off or tweak the
106       array folding feature (that neat little transformation that produced
107       $config->{server}) you'll find options for that as well.
108
109       If you want to generate XML (for example to write a modified version of
110       $config back out as XML), check out "XMLout()".
111
112       If your needs are not so simple, this may not be the module for you.
113       In that case, you might want to read "WHERE TO FROM HERE?".
114

DESCRIPTION

116       The XML::Simple module provides a simple API layer on top of an under‐
117       lying XML parsing module (either XML::Parser or one of the SAX2 parser
118       modules).  Two functions are exported: "XMLin()" and "XMLout()".  Note:
119       you can explicity request the lower case versions of the function
120       names: "xml_in()" and "xml_out()".
121
122       The simplest approach is to call these two functions directly, but an
123       optional object oriented interface (see "OPTIONAL OO INTERFACE" below)
124       allows them to be called as methods of an XML::Simple object.  The
125       object interface can also be used at either end of a SAX pipeline.
126
127       XMLin()
128
129       Parses XML formatted data and returns a reference to a data structure
130       which contains the same information in a more readily accessible form.
131       (Skip down to "EXAMPLES" below, for more sample code).
132
133       "XMLin()" accepts an optional XML specifier followed by zero or more
134       'name => value' option pairs.  The XML specifier can be one of the fol‐
135       lowing:
136
137       A filename
138           If the filename contains no directory components "XMLin()" will
139           look for the file in each directory in the SearchPath (see
140           "OPTIONS" below) or in the current directory if the SearchPath
141           option is not defined.  eg:
142
143             $ref = XMLin('/etc/params.xml');
144
145           Note, the filename '-' can be used to parse from STDIN.
146
147       undef
148           If there is no XML specifier, "XMLin()" will check the script
149           directory and each of the SearchPath directories for a file with
150           the same name as the script but with the extension '.xml'.  Note:
151           if you wish to specify options, you must specify the value 'undef'.
152           eg:
153
154             $ref = XMLin(undef, ForceArray => 1);
155
156       A string of XML
157           A string containing XML (recognised by the presence of '<' and '>'
158           characters) will be parsed directly.  eg:
159
160             $ref = XMLin('<opt username="bob" password="flurp" />');
161
162       An IO::Handle object
163           An IO::Handle object will be read to EOF and its contents parsed.
164           eg:
165
166             $fh = IO::File->new('/etc/params.xml');
167             $ref = XMLin($fh);
168
169       XMLout()
170
171       Takes a data structure (generally a hashref) and returns an XML encod‐
172       ing of that structure.  If the resulting XML is parsed using "XMLin()",
173       it should return a data structure equivalent to the original (see
174       caveats below).
175
176       The "XMLout()" function can also be used to output the XML as SAX
177       events see the "Handler" option and "SAX SUPPORT" for more details).
178
179       When translating hashes to XML, hash keys which have a leading '-' will
180       be silently skipped.  This is the approved method for marking elements
181       of a data structure which should be ignored by "XMLout".  (Note: If
182       these items were not skipped the key names would be emitted as element
183       or attribute names with a leading '-' which would not be valid XML).
184
185       Caveats
186
187       Some care is required in creating data structures which will be passed
188       to "XMLout()".  Hash keys from the data structure will be encoded as
189       either XML element names or attribute names.  Therefore, you should use
190       hash key names which conform to the relatively strict XML naming rules:
191
192       Names in XML must begin with a letter.  The remaining characters may be
193       letters, digits, hyphens (-), underscores (_) or full stops (.).  It is
194       also allowable to include one colon (:) in an element name but this
195       should only be used when working with namespaces (XML::Simple can only
196       usefully work with namespaces when teamed with a SAX Parser).
197
198       You can use other punctuation characters in hash values (just not in
199       hash keys) however XML::Simple does not support dumping binary data.
200
201       If you break these rules, the current implementation of "XMLout()" will
202       simply emit non-compliant XML which will be rejected if you try to read
203       it back in.  (A later version of XML::Simple might take a more proac‐
204       tive approach).
205
206       Note also that although you can nest hashes and arrays to arbitrary
207       levels, circular data structures are not supported and will cause
208       "XMLout()" to die.
209
210       If you wish to 'round-trip' arbitrary data structures from Perl to XML
211       and back to Perl, then you should probably disable array folding (using
212       the KeyAttr option) both with "XMLout()" and with "XMLin()".  If you
213       still don't get the expected results, you may prefer to use XML::Dumper
214       which is designed for exactly that purpose.
215
216       Refer to "WHERE TO FROM HERE?" if "XMLout()" is too simple for your
217       needs.
218

OPTIONS

220       XML::Simple supports a number of options (in fact as each release of
221       XML::Simple adds more options, the module's claim to the name 'Simple'
222       becomes increasingly tenuous).  If you find yourself repeatedly having
223       to specify the same options, you might like to investigate "OPTIONAL OO
224       INTERFACE" below.
225
226       If you can't be bothered reading the documentation, refer to "STRICT
227       MODE" to automatically catch common mistakes.
228
229       Because there are so many options, it's hard for new users to know
230       which ones are important, so here are the two you really need to know
231       about:
232
233       ·   check out "ForceArray" because you'll almost certainly want to turn
234           it on
235
236       ·   make sure you know what the "KeyAttr" option does and what its
237           default value is because it may surprise you otherwise (note in
238           particular that 'KeyAttr' affects both "XMLin" and "XMLout")
239
240       The option name headings below have a trailing 'comment' - a hash fol‐
241       lowed by two pieces of metadata:
242
243       ·   Options are marked with 'in' if they are recognised by "XMLin()"
244           and 'out' if they are recognised by "XMLout()".
245
246       ·   Each option is also flagged to indicate whether it is:
247
248            'important'   - don't use the module until you understand this one
249            'handy'       - you can skip this on the first time through
250            'advanced'    - you can skip this on the second time through
251            'SAX only'    - don't worry about this unless you're using SAX (or
252                            alternatively if you need this, you also need SAX)
253            'seldom used' - you'll probably never use this unless you were the
254                            person that requested the feature
255
256       The options are listed alphabetically:
257
258       Note: option names are no longer case sensitive so you can use the
259       mixed case versions shown here; all lower case as required by versions
260       2.03 and earlier; or you can add underscores between the words (eg:
261       key_attr).
262
263       AttrIndent => 1 # out - handy
264
265       When you are using "XMLout()", enable this option to have attributes
266       printed one-per-line with sensible indentation rather than all on one
267       line.
268
269       Cache => [ cache schemes ] # in - advanced
270
271       Because loading the XML::Parser module and parsing an XML file can con‐
272       sume a significant number of CPU cycles, it is often desirable to cache
273       the output of "XMLin()" for later reuse.
274
275       When parsing from a named file, XML::Simple supports a number of
276       caching schemes.  The 'Cache' option may be used to specify one or more
277       schemes (using an anonymous array).  Each scheme will be tried in turn
278       in the hope of finding a cached pre-parsed representation of the XML
279       file.  If no cached copy is found, the file will be parsed and the
280       first cache scheme in the list will be used to save a copy of the
281       results.  The following cache schemes have been implemented:
282
283       storable
284           Utilises Storable.pm to read/write a cache file with the same name
285           as the XML file but with the extension .stor
286
287       memshare
288           When a file is first parsed, a copy of the resulting data structure
289           is retained in memory in the XML::Simple module's namespace.  Sub‐
290           sequent calls to parse the same file will return a reference to
291           this structure.  This cached version will persist only for the life
292           of the Perl interpreter (which in the case of mod_perl for example,
293           may be some significant time).
294
295           Because each caller receives a reference to the same data struc‐
296           ture, a change made by one caller will be visible to all.  For this
297           reason, the reference returned should be treated as read-only.
298
299       memcopy
300           This scheme works identically to 'memshare' (above) except that
301           each caller receives a reference to a new data structure which is a
302           copy of the cached version.  Copying the data structure will add a
303           little processing overhead, therefore this scheme should only be
304           used where the caller intends to modify the data structure (or
305           wishes to protect itself from others who might).  This scheme uses
306           Storable.pm to perform the copy.
307
308       Warning! The memory-based caching schemes compare the timestamp on the
309       file to the time when it was last parsed.  If the file is stored on an
310       NFS filesystem (or other network share) and the clock on the file
311       server is not exactly synchronised with the clock where your script is
312       run, updates to the source XML file may appear to be ignored.
313
314       ContentKey => 'keyname' # in+out - seldom used
315
316       When text content is parsed to a hash value, this option let's you
317       specify a name for the hash key to override the default 'content'.  So
318       for example:
319
320         XMLin('<opt one="1">Text</opt>', ContentKey => 'text')
321
322       will parse to:
323
324         { 'one' => 1, 'text' => 'Text' }
325
326       instead of:
327
328         { 'one' => 1, 'content' => 'Text' }
329
330       "XMLout()" will also honour the value of this option when converting a
331       hashref to XML.
332
333       You can also prefix your selected key name with a '-' character to have
334       "XMLin()" try a little harder to eliminate unnecessary 'content' keys
335       after array folding.  For example:
336
337         XMLin(
338           '<opt><item name="one">First</item><item name="two">Second</item></opt>',
339           KeyAttr => {item => 'name'},
340           ForceArray => [ 'item' ],
341           ContentKey => '-content'
342         )
343
344       will parse to:
345
346         {
347           'item' => {
348             'one' =>  'First'
349             'two' =>  'Second'
350           }
351         }
352
353       rather than this (without the '-'):
354
355         {
356           'item' => {
357             'one' => { 'content' => 'First' }
358             'two' => { 'content' => 'Second' }
359           }
360         }
361
362       DataHandler => code_ref # in - SAX only
363
364       When you use an XML::Simple object as a SAX handler, it will return a
365       'simple tree' data structure in the same format as "XMLin()" would
366       return.  If this option is set (to a subroutine reference), then when
367       the tree is built the subroutine will be called and passed two argu‐
368       ments: a reference to the XML::Simple object and a reference to the
369       data tree.  The return value from the subroutine will be returned to
370       the SAX driver.  (See "SAX SUPPORT" for more details).
371
372       ForceArray => 1 # in - important
373
374       This option should be set to '1' to force nested elements to be repre‐
375       sented as arrays even when there is only one.  Eg, with ForceArray
376       enabled, this XML:
377
378           <opt>
379             <name>value</name>
380           </opt>
381
382       would parse to this:
383
384           {
385             'name' => [
386                         'value'
387                       ]
388           }
389
390       instead of this (the default):
391
392           {
393             'name' => 'value'
394           }
395
396       This option is especially useful if the data structure is likely to be
397       written back out as XML and the default behaviour of rolling single
398       nested elements up into attributes is not desirable.
399
400       If you are using the array folding feature, you should almost certainly
401       enable this option.  If you do not, single nested elements will not be
402       parsed to arrays and therefore will not be candidates for folding to a
403       hash.  (Given that the default value of 'KeyAttr' enables array fold‐
404       ing, the default value of this option should probably also have been
405       enabled too - sorry).
406
407       ForceArray => [ names ] # in - important
408
409       This alternative (and preferred) form of the 'ForceArray' option allows
410       you to specify a list of element names which should always be forced
411       into an array representation, rather than the 'all or nothing' approach
412       above.
413
414       It is also possible (since version 2.05) to include compiled regular
415       expressions in the list - any element names which match the pattern
416       will be forced to arrays.  If the list contains only a single regex,
417       then it is not necessary to enclose it in an arrayref.  Eg:
418
419         ForceArray => qr/_list$/
420
421       ForceContent => 1 # in - seldom used
422
423       When "XMLin()" parses elements which have text content as well as
424       attributes, the text content must be represented as a hash value rather
425       than a simple scalar.  This option allows you to force text content to
426       always parse to a hash value even when there are no attributes.  So for
427       example:
428
429         XMLin('<opt><x>text1</x><y a="2">text2</y></opt>', ForceContent => 1)
430
431       will parse to:
432
433         {
434           'x' => {           'content' => 'text1' },
435           'y' => { 'a' => 2, 'content' => 'text2' }
436         }
437
438       instead of:
439
440         {
441           'x' => 'text1',
442           'y' => { 'a' => 2, 'content' => 'text2' }
443         }
444
445       GroupTags => { grouping tag => grouped tag } # in+out - handy
446
447       You can use this option to eliminate extra levels of indirection in
448       your Perl data structure.  For example this XML:
449
450         <opt>
451          <searchpath>
452            <dir>/usr/bin</dir>
453            <dir>/usr/local/bin</dir>
454            <dir>/usr/X11/bin</dir>
455          </searchpath>
456        </opt>
457
458       Would normally be read into a structure like this:
459
460         {
461           searchpath => {
462                           dir => [ '/usr/bin', '/usr/local/bin', '/usr/X11/bin' ]
463                         }
464         }
465
466       But when read in with the appropriate value for 'GroupTags':
467
468         my $opt = XMLin($xml, GroupTags => { searchpath => 'dir' });
469
470       It will return this simpler structure:
471
472         {
473           searchpath => [ '/usr/bin', '/usr/local/bin', '/usr/X11/bin' ]
474         }
475
476       The grouping element ("<searchpath>" in the example) must not contain
477       any attributes or elements other than the grouped element.
478
479       You can specify multiple 'grouping element' to 'grouped element' map‐
480       pings in the same hashref.  If this option is combined with "KeyAttr",
481       the array folding will occur first and then the grouped element names
482       will be eliminated.
483
484       "XMLout" will also use the grouptag mappings to re-introduce the tags
485       around the grouped elements.  Beware though that this will occur in all
486       places that the 'grouping tag' name occurs - you probably don't want to
487       use the same name for elements as well as attributes.
488
489       Handler => object_ref # out - SAX only
490
491       Use the 'Handler' option to have "XMLout()" generate SAX events rather
492       than returning a string of XML.  For more details see "SAX SUPPORT"
493       below.
494
495       Note: the current implementation of this option generates a string of
496       XML and uses a SAX parser to translate it into SAX events.  The normal
497       encoding rules apply here - your data must be UTF8 encoded unless you
498       specify an alternative encoding via the 'XMLDecl' option; and by the
499       time the data reaches the handler object, it will be in UTF8 form
500       regardless of the encoding you supply.  A future implementation of this
501       option may generate the events directly.
502
503       KeepRoot => 1 # in+out - handy
504
505       In its attempt to return a data structure free of superfluous detail
506       and unnecessary levels of indirection, "XMLin()" normally discards the
507       root element name.  Setting the 'KeepRoot' option to '1' will cause the
508       root element name to be retained.  So after executing this code:
509
510         $config = XMLin('<config tempdir="/tmp" />', KeepRoot => 1)
511
512       You'll be able to reference the tempdir as "$config->{config}->{tem‐
513       pdir}" instead of the default "$config->{tempdir}".
514
515       Similarly, setting the 'KeepRoot' option to '1' will tell "XMLout()"
516       that the data structure already contains a root element name and it is
517       not necessary to add another.
518
519       KeyAttr => [ list ] # in+out - important
520
521       This option controls the 'array folding' feature which translates
522       nested elements from an array to a hash.  It also controls the 'unfold‐
523       ing' of hashes to arrays.
524
525       For example, this XML:
526
527           <opt>
528             <user login="grep" fullname="Gary R Epstein" />
529             <user login="stty" fullname="Simon T Tyson" />
530           </opt>
531
532       would, by default, parse to this:
533
534           {
535             'user' => [
536                         {
537                           'login' => 'grep',
538                           'fullname' => 'Gary R Epstein'
539                         },
540                         {
541                           'login' => 'stty',
542                           'fullname' => 'Simon T Tyson'
543                         }
544                       ]
545           }
546
547       If the option 'KeyAttr => "login"' were used to specify that the
548       'login' attribute is a key, the same XML would parse to:
549
550           {
551             'user' => {
552                         'stty' => {
553                                     'fullname' => 'Simon T Tyson'
554                                   },
555                         'grep' => {
556                                     'fullname' => 'Gary R Epstein'
557                                   }
558                       }
559           }
560
561       The key attribute names should be supplied in an arrayref if there is
562       more than one.  "XMLin()" will attempt to match attribute names in the
563       order supplied.  "XMLout()" will use the first attribute name supplied
564       when 'unfolding' a hash into an array.
565
566       Note 1: The default value for 'KeyAttr' is ['name', 'key', 'id'].  If
567       you do not want folding on input or unfolding on output you must set‐
568       ting this option to an empty list to disable the feature.
569
570       Note 2: If you wish to use this option, you should also enable the
571       "ForceArray" option.  Without 'ForceArray', a single nested element
572       will be rolled up into a scalar rather than an array and therefore will
573       not be folded (since only arrays get folded).
574
575       KeyAttr => { list } # in+out - important
576
577       This alternative (and preferred) method of specifiying the key
578       attributes allows more fine grained control over which elements are
579       folded and on which attributes.  For example the option 'KeyAttr => {
580       package => 'id' } will cause any package elements to be folded on the
581       'id' attribute.  No other elements which have an 'id' attribute will be
582       folded at all.
583
584       Note: "XMLin()" will generate a warning (or a fatal error in "STRICT
585       MODE") if this syntax is used and an element which does not have the
586       specified key attribute is encountered (eg: a 'package' element without
587       an 'id' attribute, to use the example above).  Warnings will only be
588       generated if -w is in force.
589
590       Two further variations are made possible by prefixing a '+' or a '-'
591       character to the attribute name:
592
593       The option 'KeyAttr => { user => "+login" }' will cause this XML:
594
595           <opt>
596             <user login="grep" fullname="Gary R Epstein" />
597             <user login="stty" fullname="Simon T Tyson" />
598           </opt>
599
600       to parse to this data structure:
601
602           {
603             'user' => {
604                         'stty' => {
605                                     'fullname' => 'Simon T Tyson',
606                                     'login'    => 'stty'
607                                   },
608                         'grep' => {
609                                     'fullname' => 'Gary R Epstein',
610                                     'login'    => 'grep'
611                                   }
612                       }
613           }
614
615       The '+' indicates that the value of the key attribute should be copied
616       rather than moved to the folded hash key.
617
618       A '-' prefix would produce this result:
619
620           {
621             'user' => {
622                         'stty' => {
623                                     'fullname' => 'Simon T Tyson',
624                                     '-login'    => 'stty'
625                                   },
626                         'grep' => {
627                                     'fullname' => 'Gary R Epstein',
628                                     '-login'    => 'grep'
629                                   }
630                       }
631           }
632
633       As described earlier, "XMLout" will ignore hash keys starting with a
634       '-'.
635
636       NoAttr => 1 # in+out - handy
637
638       When used with "XMLout()", the generated XML will contain no
639       attributes.  All hash key/values will be represented as nested elements
640       instead.
641
642       When used with "XMLin()", any attributes in the XML will be ignored.
643
644       NoEscape => 1 # out - seldom used
645
646       By default, "XMLout()" will translate the characters '<', '>', '&' and
647       '"' to '&lt;', '&gt;', '&amp;' and '&quot' respectively.  Use this
648       option to suppress escaping (presumably because you've already escaped
649       the data in some more sophisticated manner).
650
651       NoIndent => 1 # out - seldom used
652
653       Set this option to 1 to disable "XMLout()"'s default 'pretty printing'
654       mode.  With this option enabled, the XML output will all be on one line
655       (unless there are newlines in the data) - this may be easier for down‐
656       stream processing.
657
658       NoSort => 1 # out - seldom used
659
660       Newer versions of XML::Simple sort elements and attributes alphabeti‐
661       cally (*), by default.  Enable this option to suppress the sorting -
662       possibly for backwards compatibility.
663
664       * Actually, sorting is alphabetical but 'key' attribute or element
665       names (as in 'KeyAttr') sort first.  Also, when a hash of hashes is
666       'unfolded', the elements are sorted alphabetically by the value of the
667       key field.
668
669       NormaliseSpace => 0 ⎪ 1 ⎪ 2 # in - handy
670
671       This option controls how whitespace in text content is handled.  Recog‐
672       nised values for the option are:
673
674       ·   0 = (default) whitespace is passed through unaltered (except of
675           course for the normalisation of whitespace in attribute values
676           which is mandated by the XML recommendation)
677
678       ·   1 = whitespace is normalised in any value used as a hash key (nor‐
679           malising means removing leading and trailing whitespace and col‐
680           lapsing sequences of whitespace characters to a single space)
681
682       ·   2 = whitespace is normalised in all text content
683
684       Note: you can spell this option with a 'z' if that is more natural for
685       you.
686
687       NSExpand => 1 # in+out handy - SAX only
688
689       This option controls namespace expansion - the translation of element
690       and attribute names of the form 'prefix:name' to '{uri}name'.  For
691       example the element name 'xsl:template' might be expanded to:
692       '{http://www.w3.org/1999/XSL/Transform}template'.
693
694       By default, "XMLin()" will return element names and attribute names
695       exactly as they appear in the XML.  Setting this option to 1 will cause
696       all element and attribute names to be expanded to include their names‐
697       pace prefix.
698
699       Note: You must be using a SAX parser for this option to work (ie: it
700       does not work with XML::Parser).
701
702       This option also controls whether "XMLout()" performs the reverse
703       translation from '{uri}name' back to 'prefix:name'.  The default is no
704       translation.  If your data contains expanded names, you should set this
705       option to 1 otherwise "XMLout" will emit XML which is not well formed.
706
707       Note: You must have the XML::NamespaceSupport module installed if you
708       want "XMLout()" to translate URIs back to prefixes.
709
710       NumericEscape => 0 ⎪ 1 ⎪ 2 # out - handy
711
712       Use this option to have 'high' (non-ASCII) characters in your Perl data
713       structure converted to numeric entities (eg: &#8364;) in the XML out‐
714       put.  Three levels are possible:
715
716       0 - default: no numeric escaping (OK if you're writing out UTF8)
717
718       1 - only characters above 0xFF are escaped (ie: characters in the
719       0x80-FF range are not escaped), possibly useful with ISO8859-1 output
720
721       2 - all characters above 0x7F are escaped (good for plain ASCII output)
722
723       OutputFile => <file specifier> # out - handy
724
725       The default behaviour of "XMLout()" is to return the XML as a string.
726       If you wish to write the XML to a file, simply supply the filename
727       using the 'OutputFile' option.
728
729       This option also accepts an IO handle object - especially useful in
730       Perl 5.8.0 and later for output using an encoding other than UTF-8, eg:
731
732         open my $fh, '>:encoding(iso-8859-1)', $path or die "open($path): $!";
733         XMLout($ref, OutputFile => $fh);
734
735       Note, XML::Simple does not require that the object you pass in to the
736       OutputFile option inherits from IO::Handle - it simply assumes the
737       object supports a "print" method.
738
739       ParserOpts => [ XML::Parser Options ] # in - don't use this
740
741       Note: This option is now officially deprecated.  If you find it useful,
742       email the author with an example of what you use it for.  Do not use
743       this option to set the ProtocolEncoding, that's just plain wrong - fix
744       the XML.
745
746       This option allows you to pass parameters to the constructor of the
747       underlying XML::Parser object (which of course assumes you're not using
748       SAX).
749
750       RootName => 'string' # out - handy
751
752       By default, when "XMLout()" generates XML, the root element will be
753       named 'opt'.  This option allows you to specify an alternative name.
754
755       Specifying either undef or the empty string for the RootName option
756       will produce XML with no root elements.  In most cases the resulting
757       XML fragment will not be 'well formed' and therefore could not be read
758       back in by "XMLin()".  Nevertheless, the option has been found to be
759       useful in certain circumstances.
760
761       SearchPath => [ list ] # in - handy
762
763       If you pass "XMLin()" a filename, but the filename include no directory
764       component, you can use this option to specify which directories should
765       be searched to locate the file.  You might use this option to search
766       first in the user's home directory, then in a global directory such as
767       /etc.
768
769       If a filename is provided to "XMLin()" but SearchPath is not defined,
770       the file is assumed to be in the current directory.
771
772       If the first parameter to "XMLin()" is undefined, the default Search‐
773       Path will contain only the directory in which the script itself is
774       located.  Otherwise the default SearchPath will be empty.
775
776       SuppressEmpty => 1 ⎪ '' ⎪ undef # in+out - handy
777
778       This option controls what "XMLin()" should do with empty elements (no
779       attributes and no content).  The default behaviour is to represent them
780       as empty hashes.  Setting this option to a true value (eg: 1) will
781       cause empty elements to be skipped altogether.  Setting the option to
782       'undef' or the empty string will cause empty elements to be represented
783       as the undefined value or the empty string respectively.  The latter
784       two alternatives are a little easier to test for in your code than a
785       hash with no keys.
786
787       The option also controls what "XMLout()" does with undefined values.
788       Setting the option to undef causes undefined values to be output as
789       empty elements (rather than empty attributes), it also suppresses the
790       generation of warnings about undefined values.  Setting the option to a
791       true value (eg: 1) causes undefined values to be skipped altogether on
792       output.
793
794       ValueAttr => [ names ] # in - handy
795
796       Use this option to deal elements which always have a single attribute
797       and no content.  Eg:
798
799         <opt>
800           <colour value="red" />
801           <size   value="XXL" />
802         </opt>
803
804       Setting "ValueAttr => [ 'value' ]" will cause the above XML to parse
805       to:
806
807         {
808           colour => 'red',
809           size   => 'XXL'
810         }
811
812       instead of this (the default):
813
814         {
815           colour => { value => 'red' },
816           size   => { value => 'XXL' }
817         }
818
819       Note: This form of the ValueAttr option is not compatible with
820       "XMLout()" - since the attribute name is discarded at parse time, the
821       original XML cannot be reconstructed.
822
823       ValueAttr => { element => attribute, ... } # in+out - handy
824
825       This (preferred) form of the ValueAttr option requires you to specify
826       both the element and the attribute names.  This is not only safer, it
827       also allows the original XML to be reconstructed by "XMLout()".
828
829       Note: You probably don't want to use this option and the NoAttr option
830       at the same time.
831
832       Variables => { name => value } # in - handy
833
834       This option allows variables in the XML to be expanded when the file is
835       read.  (there is no facility for putting the variable names back if you
836       regenerate XML using "XMLout").
837
838       A 'variable' is any text of the form "${name}" which occurs in an
839       attribute value or in the text content of an element.  If 'name'
840       matches a key in the supplied hashref, "${name}" will be replaced with
841       the corresponding value from the hashref.  If no matching key is found,
842       the variable will not be replaced.  Names must match the regex:
843       "[\w.]+" (ie: only 'word' characters and dots are allowed).
844
845       VarAttr => 'attr_name' # in - handy
846
847       In addition to the variables defined using "Variables", this option
848       allows variables to be defined in the XML.  A variable definition con‐
849       sists of an element with an attribute called 'attr_name' (the value of
850       the "VarAttr" option).  The value of the attribute will be used as the
851       variable name and the text content of the element will be used as the
852       value.  A variable defined in this way will override a variable defined
853       using the "Variables" option.  For example:
854
855         XMLin( '<opt>
856                   <dir name="prefix">/usr/local/apache</dir>
857                   <dir name="exec_prefix">${prefix}</dir>
858                   <dir name="bindir">${exec_prefix}/bin</dir>
859                 </opt>',
860                VarAttr => 'name', ContentKey => '-content'
861               );
862
863       produces the following data structure:
864
865         {
866           dir => {
867                    prefix      => '/usr/local/apache',
868                    exec_prefix => '/usr/local/apache',
869                    bindir      => '/usr/local/apache/bin',
870                  }
871         }
872
873       XMLDecl => 1  or  XMLDecl => 'string'  # out - handy
874
875       If you want the output from "XMLout()" to start with the optional XML
876       declaration, simply set the option to '1'.  The default XML declaration
877       is:
878
879               <?xml version='1.0' standalone='yes'?>
880
881       If you want some other string (for example to declare an encoding
882       value), set the value of this option to the complete string you
883       require.
884

OPTIONAL OO INTERFACE

886       The procedural interface is both simple and convenient however there
887       are a couple of reasons why you might prefer to use the object oriented
888       (OO) interface:
889
890       ·   to define a set of default values which should be used on all sub‐
891           sequent calls to "XMLin()" or "XMLout()"
892
893       ·   to override methods in XML::Simple to provide customised behaviour
894
895       The default values for the options described above are unlikely to suit
896       everyone.  The OO interface allows you to effectively override
897       XML::Simple's defaults with your preferred values.  It works like this:
898
899       First create an XML::Simple parser object with your preferred defaults:
900
901         my $xs = XML::Simple->new(ForceArray => 1, KeepRoot => 1);
902
903       then call "XMLin()" or "XMLout()" as a method of that object:
904
905         my $ref = $xs->XMLin($xml);
906         my $xml = $xs->XMLout($ref);
907
908       You can also specify options when you make the method calls and these
909       values will be merged with the values specified when the object was
910       created.  Values specified in a method call take precedence.
911
912       Overriding methods is a more advanced topic but might be useful if for
913       example you wished to provide an alternative routine for escaping char‐
914       acter data (the escape_value method) or for building the initial parse
915       tree (the build_tree method).
916
917       Note: when called as methods, the "XMLin()" and "XMLout()" routines may
918       be called as "xml_in()" or "xml_out()".  The method names are aliased
919       so the only difference is the aesthetics.
920

STRICT MODE

922       If you import the XML::Simple routines like this:
923
924         use XML::Simple qw(:strict);
925
926       the following common mistakes will be detected and treated as fatal
927       errors
928
929       ·   Failing to explicitly set the "KeyAttr" option - if you can't be
930           bothered reading about this option, turn it off with: KeyAttr => [
931           ]
932
933       ·   Failing to explicitly set the "ForceArray" option - if you can't be
934           bothered reading about this option, set it to the safest mode with:
935           ForceArray => 1
936
937       ·   Setting ForceArray to an array, but failing to list all the ele‐
938           ments from the KeyAttr hash.
939
940       ·   Data error - KeyAttr is set to say { part => 'partnum' } but the
941           XML contains one or more <part> elements without a 'partnum'
942           attribute (or nested element).  Note: if strict mode is not set but
943           -w is, this condition triggers a warning.
944
945       ·   Data error - as above, but value of key attribute (eg: partnum) is
946           not a scalar string (due to nested elements etc).  This will also
947           trigger a warning if strict mode is not enabled.
948

SAX SUPPORT

950       From version 1.08_01, XML::Simple includes support for SAX (the Simple
951       API for XML) - specifically SAX2.
952
953       In a typical SAX application, an XML parser (or SAX 'driver') module
954       generates SAX events (start of element, character data, end of element,
955       etc) as it parses an XML document and a 'handler' module processes the
956       events to extract the required data.  This simple model allows for some
957       interesting and powerful possibilities:
958
959       ·   Applications written to the SAX API can extract data from huge XML
960           documents without the memory overheads of a DOM or tree API.
961
962       ·   The SAX API allows for plug and play interchange of parser modules
963           without having to change your code to fit a new module's API.  A
964           number of SAX parsers are available with capabilities ranging from
965           extreme portability to blazing performance.
966
967       ·   A SAX 'filter' module can implement both a handler interface for
968           receiving data and a generator interface for passing modified data
969           on to a downstream handler.  Filters can be chained together in
970           'pipelines'.
971
972       ·   One filter module might split a data stream to direct data to two
973           or more downstream handlers.
974
975       ·   Generating SAX events is not the exclusive preserve of XML parsing
976           modules.  For example, a module might extract data from a rela‐
977           tional database using DBI and pass it on to a SAX pipeline for fil‐
978           tering and formatting.
979
980       XML::Simple can operate at either end of a SAX pipeline.  For example,
981       you can take a data structure in the form of a hashref and pass it into
982       a SAX pipeline using the 'Handler' option on "XMLout()":
983
984         use XML::Simple;
985         use Some::SAX::Filter;
986         use XML::SAX::Writer;
987
988         my $ref = {
989                      ....   # your data here
990                   };
991
992         my $writer = XML::SAX::Writer->new();
993         my $filter = Some::SAX::Filter->new(Handler => $writer);
994         my $simple = XML::Simple->new(Handler => $filter);
995         $simple->XMLout($ref);
996
997       You can also put XML::Simple at the opposite end of the pipeline to
998       take advantage of the simple 'tree' data structure once the relevant
999       data has been isolated through filtering:
1000
1001         use XML::SAX;
1002         use Some::SAX::Filter;
1003         use XML::Simple;
1004
1005         my $simple = XML::Simple->new(ForceArray => 1, KeyAttr => ['partnum']);
1006         my $filter = Some::SAX::Filter->new(Handler => $simple);
1007         my $parser = XML::SAX::ParserFactory->parser(Handler => $filter);
1008
1009         my $ref = $parser->parse_uri('some_huge_file.xml');
1010
1011         print $ref->{part}->{'555-1234'};
1012
1013       You can build a filter by using an XML::Simple object as a handler and
1014       setting its DataHandler option to point to a routine which takes the
1015       resulting tree, modifies it and sends it off as SAX events to a down‐
1016       stream handler:
1017
1018         my $writer = XML::SAX::Writer->new();
1019         my $filter = XML::Simple->new(
1020                        DataHandler => sub {
1021                                         my $simple = shift;
1022                                         my $data = shift;
1023
1024                                         # Modify $data here
1025
1026                                         $simple->XMLout($data, Handler => $writer);
1027                                       }
1028                      );
1029         my $parser = XML::SAX::ParserFactory->parser(Handler => $filter);
1030
1031         $parser->parse_uri($filename);
1032
1033       Note: In this last example, the 'Handler' option was specified in the
1034       call to "XMLout()" but it could also have been specified in the con‐
1035       structor.
1036

ENVIRONMENT

1038       If you don't care which parser module XML::Simple uses then skip this
1039       section entirely (it looks more complicated than it really is).
1040
1041       XML::Simple will default to using a SAX parser if one is available or
1042       XML::Parser if SAX is not available.
1043
1044       You can dictate which parser module is used by setting either the envi‐
1045       ronment variable 'XML_SIMPLE_PREFERRED_PARSER' or the package variable
1046       $XML::Simple::PREFERRED_PARSER to contain the module name.  The follow‐
1047       ing rules are used:
1048
1049       ·   The package variable takes precedence over the environment variable
1050           if both are defined.  To force XML::Simple to ignore the environ‐
1051           ment settings and use its default rules, you can set the package
1052           variable to an empty string.
1053
1054       ·   If the 'preferred parser' is set to the string 'XML::Parser', then
1055           XML::Parser will be used (or "XMLin()" will die if XML::Parser is
1056           not installed).
1057
1058       ·   If the 'preferred parser' is set to some other value, then it is
1059           assumed to be the name of a SAX parser module and is passed to
1060           XML::SAX::ParserFactory.  If XML::SAX is not installed, or the
1061           requested parser module is not installed, then "XMLin()" will die.
1062
1063       ·   If the 'preferred parser' is not defined at all (the normal default
1064           state), an attempt will be made to load XML::SAX.  If XML::SAX is
1065           installed, then a parser module will be selected according to
1066           XML::SAX::ParserFactory's normal rules (which typically means the
1067           last SAX parser installed).
1068
1069       ·   if the 'preferred parser' is not defined and XML::SAX is not
1070           installed, then XML::Parser will be used.  "XMLin()" will die if
1071           XML::Parser is not installed.
1072
1073       Note: The XML::SAX distribution includes an XML parser written entirely
1074       in Perl.  It is very portable but it is not very fast.  You should con‐
1075       sider installing XML::LibXML or XML::SAX::Expat if they are available
1076       for your platform.
1077

ERROR HANDLING

1079       The XML standard is very clear on the issue of non-compliant documents.
1080       An error in parsing any single element (for example a missing end tag)
1081       must cause the whole document to be rejected.  XML::Simple will die
1082       with an appropriate message if it encounters a parsing error.
1083
1084       If dying is not appropriate for your application, you should arrange to
1085       call "XMLin()" in an eval block and look for errors in $@.  eg:
1086
1087           my $config = eval { XMLin() };
1088           PopUpMessage($@) if($@);
1089
1090       Note, there is a common misconception that use of eval will signifi‐
1091       cantly slow down a script.  While that may be true when the code being
1092       eval'd is in a string, it is not true of code like the sample above.
1093

EXAMPLES

1095       When "XMLin()" reads the following very simple piece of XML:
1096
1097           <opt username="testuser" password="frodo"></opt>
1098
1099       it returns the following data structure:
1100
1101           {
1102             'username' => 'testuser',
1103             'password' => 'frodo'
1104           }
1105
1106       The identical result could have been produced with this alternative
1107       XML:
1108
1109           <opt username="testuser" password="frodo" />
1110
1111       Or this (although see 'ForceArray' option for variations):
1112
1113           <opt>
1114             <username>testuser</username>
1115             <password>frodo</password>
1116           </opt>
1117
1118       Repeated nested elements are represented as anonymous arrays:
1119
1120           <opt>
1121             <person firstname="Joe" lastname="Smith">
1122               <email>joe@smith.com</email>
1123               <email>jsmith@yahoo.com</email>
1124             </person>
1125             <person firstname="Bob" lastname="Smith">
1126               <email>bob@smith.com</email>
1127             </person>
1128           </opt>
1129
1130           {
1131             'person' => [
1132                           {
1133                             'email' => [
1134                                          'joe@smith.com',
1135                                          'jsmith@yahoo.com'
1136                                        ],
1137                             'firstname' => 'Joe',
1138                             'lastname' => 'Smith'
1139                           },
1140                           {
1141                             'email' => 'bob@smith.com',
1142                             'firstname' => 'Bob',
1143                             'lastname' => 'Smith'
1144                           }
1145                         ]
1146           }
1147
1148       Nested elements with a recognised key attribute are transformed
1149       (folded) from an array into a hash keyed on the value of that attribute
1150       (see the "KeyAttr" option):
1151
1152           <opt>
1153             <person key="jsmith" firstname="Joe" lastname="Smith" />
1154             <person key="tsmith" firstname="Tom" lastname="Smith" />
1155             <person key="jbloggs" firstname="Joe" lastname="Bloggs" />
1156           </opt>
1157
1158           {
1159             'person' => {
1160                           'jbloggs' => {
1161                                          'firstname' => 'Joe',
1162                                          'lastname' => 'Bloggs'
1163                                        },
1164                           'tsmith' => {
1165                                         'firstname' => 'Tom',
1166                                         'lastname' => 'Smith'
1167                                       },
1168                           'jsmith' => {
1169                                         'firstname' => 'Joe',
1170                                         'lastname' => 'Smith'
1171                                       }
1172                         }
1173           }
1174
1175       The <anon> tag can be used to form anonymous arrays:
1176
1177           <opt>
1178             <head><anon>Col 1</anon><anon>Col 2</anon><anon>Col 3</anon></head>
1179             <data><anon>R1C1</anon><anon>R1C2</anon><anon>R1C3</anon></data>
1180             <data><anon>R2C1</anon><anon>R2C2</anon><anon>R2C3</anon></data>
1181             <data><anon>R3C1</anon><anon>R3C2</anon><anon>R3C3</anon></data>
1182           </opt>
1183
1184           {
1185             'head' => [
1186                         [ 'Col 1', 'Col 2', 'Col 3' ]
1187                       ],
1188             'data' => [
1189                         [ 'R1C1', 'R1C2', 'R1C3' ],
1190                         [ 'R2C1', 'R2C2', 'R2C3' ],
1191                         [ 'R3C1', 'R3C2', 'R3C3' ]
1192                       ]
1193           }
1194
1195       Anonymous arrays can be nested to arbirtrary levels and as a special
1196       case, if the surrounding tags for an XML document contain only an
1197       anonymous array the arrayref will be returned directly rather than the
1198       usual hashref:
1199
1200           <opt>
1201             <anon><anon>Col 1</anon><anon>Col 2</anon></anon>
1202             <anon><anon>R1C1</anon><anon>R1C2</anon></anon>
1203             <anon><anon>R2C1</anon><anon>R2C2</anon></anon>
1204           </opt>
1205
1206           [
1207             [ 'Col 1', 'Col 2' ],
1208             [ 'R1C1', 'R1C2' ],
1209             [ 'R2C1', 'R2C2' ]
1210           ]
1211
1212       Elements which only contain text content will simply be represented as
1213       a scalar.  Where an element has both attributes and text content, the
1214       element will be represented as a hashref with the text content in the
1215       'content' key (see the "ContentKey" option):
1216
1217         <opt>
1218           <one>first</one>
1219           <two attr="value">second</two>
1220         </opt>
1221
1222         {
1223           'one' => 'first',
1224           'two' => { 'attr' => 'value', 'content' => 'second' }
1225         }
1226
1227       Mixed content (elements which contain both text content and nested ele‐
1228       ments) will be not be represented in a useful way - element order and
1229       significant whitespace will be lost.  If you need to work with mixed
1230       content, then XML::Simple is not the right tool for your job - check
1231       out the next section.
1232

WHERE TO FROM HERE?

1234       XML::Simple is able to present a simple API because it makes some
1235       assumptions on your behalf.  These include:
1236
1237       ·   You're not interested in text content consisting only of whitespace
1238
1239       ·   You don't mind that when things get slurped into a hash the order
1240           is lost
1241
1242       ·   You don't want fine-grained control of the formatting of generated
1243           XML
1244
1245       ·   You would never use a hash key that was not a legal XML element
1246           name
1247
1248       ·   You don't need help converting between different encodings
1249
1250       In a serious XML project, you'll probably outgrow these assumptions
1251       fairly quickly.  This section of the document used to offer some advice
1252       on chosing a more powerful option.  That advice has now grown into the
1253       'Perl-XML FAQ' document which you can find at: <http://perl-xml.source
1254       forge.net/faq/>
1255
1256       The advice in the FAQ boils down to a quick explanation of tree versus
1257       event based parsers and then recommends:
1258
1259       For event based parsing, use SAX (do not set out to write any new code
1260       for XML::Parser's handler API - it is obselete).
1261
1262       For tree-based parsing, you could choose between the 'Perlish' approach
1263       of XML::Twig and more standards based DOM implementations - preferably
1264       one with XPath support.
1265

SEE ALSO

1267       XML::Simple requires either XML::Parser or XML::SAX.
1268
1269       To generate documents with namespaces, XML::NamespaceSupport is
1270       required.
1271
1272       The optional caching functions require Storable.
1273
1274       Answers to Frequently Asked Questions about XML::Simple are bundled
1275       with this distribution as: XML::Simple::FAQ
1276
1278       Copyright 1999-2004 Grant McLean <grantm@cpan.org>
1279
1280       This library is free software; you can redistribute it and/or modify it
1281       under the same terms as Perl itself.
1282
1283
1284
1285perl v5.8.8                       2004-11-19                    XML::Simple(3)
Impressum