1XML::Simple(3)        User Contributed Perl Documentation       XML::Simple(3)
2
3
4

NAME

6       XML::Simple - Easy API to maintain XML (esp config files)
7

SYNOPSIS

9           use XML::Simple;
10
11           my $ref = XMLin([<xml file or string>] [, <options>]);
12
13           my $xml = XMLout($hashref [, <options>]);
14
15       Or the object oriented way:
16
17           require XML::Simple;
18
19           my $xs = XML::Simple->new(options);
20
21           my $ref = $xs->XMLin([<xml file or string>] [, <options>]);
22
23           my $xml = $xs->XMLout($hashref [, <options>]);
24
25       (or see "SAX SUPPORT" for 'the SAX way').
26
27       To catch common errors:
28
29           use XML::Simple qw(:strict);
30
31       (see "STRICT MODE" for more details).
32

QUICK START

34       Say you have a script called foo and a file of configuration options
35       called foo.xml containing this:
36
37         <config logdir="/var/log/foo/" debugfile="/tmp/foo.debug">
38           <server name="sahara" osname="solaris" osversion="2.6">
39             <address>10.0.0.101</address>
40             <address>10.0.1.101</address>
41           </server>
42           <server name="gobi" osname="irix" osversion="6.5">
43             <address>10.0.0.102</address>
44           </server>
45           <server name="kalahari" osname="linux" osversion="2.0.34">
46             <address>10.0.0.103</address>
47             <address>10.0.1.103</address>
48           </server>
49         </config>
50
51       The following lines of code in foo:
52
53         use XML::Simple;
54
55         my $config = XMLin();
56
57       will 'slurp' the configuration options into the hashref $config
58       (because no arguments are passed to "XMLin()" the name and location of
59       the XML file will be inferred from name and location of the script).
60       You can dump out the contents of the hashref using Data::Dumper:
61
62         use Data::Dumper;
63
64         print Dumper($config);
65
66       which will produce something like this (formatting has been adjusted
67       for brevity):
68
69         {
70             'logdir'        => '/var/log/foo/',
71             'debugfile'     => '/tmp/foo.debug',
72             'server'        => {
73                 'sahara'        => {
74                     'osversion'     => '2.6',
75                     'osname'        => 'solaris',
76                     'address'       => [ '10.0.0.101', '10.0.1.101' ]
77                 },
78                 'gobi'          => {
79                     'osversion'     => '6.5',
80                     'osname'        => 'irix',
81                     'address'       => '10.0.0.102'
82                 },
83                 'kalahari'      => {
84                     'osversion'     => '2.0.34',
85                     'osname'        => 'linux',
86                     'address'       => [ '10.0.0.103', '10.0.1.103' ]
87                 }
88             }
89         }
90
91       Your script could then access the name of the log directory like this:
92
93         print $config->{logdir};
94
95       similarly, the second address on the server 'kalahari' could be
96       referenced as:
97
98         print $config->{server}->{kalahari}->{address}->[1];
99
100       What could be simpler?  (Rhetorical).
101
102       For simple requirements, that's really all there is to it.  If you want
103       to store your XML in a different directory or file, or pass it in as a
104       string or even pass it in via some derivative of an IO::Handle, you'll
105       need to check out "OPTIONS".  If you want to turn off or tweak the
106       array folding feature (that neat little transformation that produced
107       $config->{server}) you'll find options for that as well.
108
109       If you want to generate XML (for example to write a modified version of
110       $config back out as XML), check out "XMLout()".
111
112       If your needs are not so simple, this may not be the module for you.
113       In that case, you might want to read "WHERE TO FROM HERE?".
114

DESCRIPTION

116       The XML::Simple module provides a simple API layer on top of an
117       underlying XML parsing module (either XML::Parser or one of the SAX2
118       parser modules).  Two functions are exported: "XMLin()" and "XMLout()".
119       Note: you can explicity request the lower case versions of the function
120       names: "xml_in()" and "xml_out()".
121
122       The simplest approach is to call these two functions directly, but an
123       optional object oriented interface (see "OPTIONAL OO INTERFACE" below)
124       allows them to be called as methods of an XML::Simple object.  The
125       object interface can also be used at either end of a SAX pipeline.
126
127   XMLin()
128       Parses XML formatted data and returns a reference to a data structure
129       which contains the same information in a more readily accessible form.
130       (Skip down to "EXAMPLES" below, for more sample code).
131
132       "XMLin()" accepts an optional XML specifier followed by zero or more
133       'name => value' option pairs.  The XML specifier can be one of the
134       following:
135
136       A filename
137           If the filename contains no directory components "XMLin()" will
138           look for the file in each directory in the SearchPath (see
139           "OPTIONS" below) or in the current directory if the SearchPath
140           option is not defined.  eg:
141
142             $ref = XMLin('/etc/params.xml');
143
144           Note, the filename '-' can be used to parse from STDIN.
145
146       undef
147           If there is no XML specifier, "XMLin()" will check the script
148           directory and each of the SearchPath directories for a file with
149           the same name as the script but with the extension '.xml'.  Note:
150           if you wish to specify options, you must specify the value 'undef'.
151           eg:
152
153             $ref = XMLin(undef, ForceArray => 1);
154
155       A string of XML
156           A string containing XML (recognised by the presence of '<' and '>'
157           characters) will be parsed directly.  eg:
158
159             $ref = XMLin('<opt username="bob" password="flurp" />');
160
161       An IO::Handle object
162           An IO::Handle object will be read to EOF and its contents parsed.
163           eg:
164
165             $fh = IO::File->new('/etc/params.xml');
166             $ref = XMLin($fh);
167
168   XMLout()
169       Takes a data structure (generally a hashref) and returns an XML
170       encoding of that structure.  If the resulting XML is parsed using
171       "XMLin()", it should return a data structure equivalent to the original
172       (see caveats below).
173
174       The "XMLout()" function can also be used to output the XML as SAX
175       events see the "Handler" option and "SAX SUPPORT" for more details).
176
177       When translating hashes to XML, hash keys which have a leading '-' will
178       be silently skipped.  This is the approved method for marking elements
179       of a data structure which should be ignored by "XMLout".  (Note: If
180       these items were not skipped the key names would be emitted as element
181       or attribute names with a leading '-' which would not be valid XML).
182
183   Caveats
184       Some care is required in creating data structures which will be passed
185       to "XMLout()".  Hash keys from the data structure will be encoded as
186       either XML element names or attribute names.  Therefore, you should use
187       hash key names which conform to the relatively strict XML naming rules:
188
189       Names in XML must begin with a letter.  The remaining characters may be
190       letters, digits, hyphens (-), underscores (_) or full stops (.).  It is
191       also allowable to include one colon (:) in an element name but this
192       should only be used when working with namespaces (XML::Simple can only
193       usefully work with namespaces when teamed with a SAX Parser).
194
195       You can use other punctuation characters in hash values (just not in
196       hash keys) however XML::Simple does not support dumping binary data.
197
198       If you break these rules, the current implementation of "XMLout()" will
199       simply emit non-compliant XML which will be rejected if you try to read
200       it back in.  (A later version of XML::Simple might take a more
201       proactive approach).
202
203       Note also that although you can nest hashes and arrays to arbitrary
204       levels, circular data structures are not supported and will cause
205       "XMLout()" to die.
206
207       If you wish to 'round-trip' arbitrary data structures from Perl to XML
208       and back to Perl, then you should probably disable array folding (using
209       the KeyAttr option) both with "XMLout()" and with "XMLin()".  If you
210       still don't get the expected results, you may prefer to use XML::Dumper
211       which is designed for exactly that purpose.
212
213       Refer to "WHERE TO FROM HERE?" if "XMLout()" is too simple for your
214       needs.
215

OPTIONS

217       XML::Simple supports a number of options (in fact as each release of
218       XML::Simple adds more options, the module's claim to the name 'Simple'
219       becomes increasingly tenuous).  If you find yourself repeatedly having
220       to specify the same options, you might like to investigate "OPTIONAL OO
221       INTERFACE" below.
222
223       If you can't be bothered reading the documentation, refer to "STRICT
224       MODE" to automatically catch common mistakes.
225
226       Because there are so many options, it's hard for new users to know
227       which ones are important, so here are the two you really need to know
228       about:
229
230       ·   check out "ForceArray" because you'll almost certainly want to turn
231           it on
232
233       ·   make sure you know what the "KeyAttr" option does and what its
234           default value is because it may surprise you otherwise (note in
235           particular that 'KeyAttr' affects both "XMLin" and "XMLout")
236
237       The option name headings below have a trailing 'comment' - a hash
238       followed by two pieces of metadata:
239
240       ·   Options are marked with 'in' if they are recognised by "XMLin()"
241           and 'out' if they are recognised by "XMLout()".
242
243       ·   Each option is also flagged to indicate whether it is:
244
245            'important'   - don't use the module until you understand this one
246            'handy'       - you can skip this on the first time through
247            'advanced'    - you can skip this on the second time through
248            'SAX only'    - don't worry about this unless you're using SAX (or
249                            alternatively if you need this, you also need SAX)
250            'seldom used' - you'll probably never use this unless you were the
251                            person that requested the feature
252
253       The options are listed alphabetically:
254
255       Note: option names are no longer case sensitive so you can use the
256       mixed case versions shown here; all lower case as required by versions
257       2.03 and earlier; or you can add underscores between the words (eg:
258       key_attr).
259
260   AttrIndent => 1 # out - handy
261       When you are using "XMLout()", enable this option to have attributes
262       printed one-per-line with sensible indentation rather than all on one
263       line.
264
265   Cache => [ cache schemes ] # in - advanced
266       Because loading the XML::Parser module and parsing an XML file can
267       consume a significant number of CPU cycles, it is often desirable to
268       cache the output of "XMLin()" for later reuse.
269
270       When parsing from a named file, XML::Simple supports a number of
271       caching schemes.  The 'Cache' option may be used to specify one or more
272       schemes (using an anonymous array).  Each scheme will be tried in turn
273       in the hope of finding a cached pre-parsed representation of the XML
274       file.  If no cached copy is found, the file will be parsed and the
275       first cache scheme in the list will be used to save a copy of the
276       results.  The following cache schemes have been implemented:
277
278       storable
279           Utilises Storable.pm to read/write a cache file with the same name
280           as the XML file but with the extension .stor
281
282       memshare
283           When a file is first parsed, a copy of the resulting data structure
284           is retained in memory in the XML::Simple module's namespace.
285           Subsequent calls to parse the same file will return a reference to
286           this structure.  This cached version will persist only for the life
287           of the Perl interpreter (which in the case of mod_perl for example,
288           may be some significant time).
289
290           Because each caller receives a reference to the same data
291           structure, a change made by one caller will be visible to all.  For
292           this reason, the reference returned should be treated as read-only.
293
294       memcopy
295           This scheme works identically to 'memshare' (above) except that
296           each caller receives a reference to a new data structure which is a
297           copy of the cached version.  Copying the data structure will add a
298           little processing overhead, therefore this scheme should only be
299           used where the caller intends to modify the data structure (or
300           wishes to protect itself from others who might).  This scheme uses
301           Storable.pm to perform the copy.
302
303       Warning! The memory-based caching schemes compare the timestamp on the
304       file to the time when it was last parsed.  If the file is stored on an
305       NFS filesystem (or other network share) and the clock on the file
306       server is not exactly synchronised with the clock where your script is
307       run, updates to the source XML file may appear to be ignored.
308
309   ContentKey => 'keyname' # in+out - seldom used
310       When text content is parsed to a hash value, this option let's you
311       specify a name for the hash key to override the default 'content'.  So
312       for example:
313
314         XMLin('<opt one="1">Text</opt>', ContentKey => 'text')
315
316       will parse to:
317
318         { 'one' => 1, 'text' => 'Text' }
319
320       instead of:
321
322         { 'one' => 1, 'content' => 'Text' }
323
324       "XMLout()" will also honour the value of this option when converting a
325       hashref to XML.
326
327       You can also prefix your selected key name with a '-' character to have
328       "XMLin()" try a little harder to eliminate unnecessary 'content' keys
329       after array folding.  For example:
330
331         XMLin(
332           '<opt><item name="one">First</item><item name="two">Second</item></opt>',
333           KeyAttr => {item => 'name'},
334           ForceArray => [ 'item' ],
335           ContentKey => '-content'
336         )
337
338       will parse to:
339
340         {
341           'item' => {
342             'one' =>  'First'
343             'two' =>  'Second'
344           }
345         }
346
347       rather than this (without the '-'):
348
349         {
350           'item' => {
351             'one' => { 'content' => 'First' }
352             'two' => { 'content' => 'Second' }
353           }
354         }
355
356   DataHandler => code_ref # in - SAX only
357       When you use an XML::Simple object as a SAX handler, it will return a
358       'simple tree' data structure in the same format as "XMLin()" would
359       return.  If this option is set (to a subroutine reference), then when
360       the tree is built the subroutine will be called and passed two
361       arguments: a reference to the XML::Simple object and a reference to the
362       data tree.  The return value from the subroutine will be returned to
363       the SAX driver.  (See "SAX SUPPORT" for more details).
364
365   ForceArray => 1 # in - important
366       This option should be set to '1' to force nested elements to be
367       represented as arrays even when there is only one.  Eg, with ForceArray
368       enabled, this XML:
369
370           <opt>
371             <name>value</name>
372           </opt>
373
374       would parse to this:
375
376           {
377             'name' => [
378                         'value'
379                       ]
380           }
381
382       instead of this (the default):
383
384           {
385             'name' => 'value'
386           }
387
388       This option is especially useful if the data structure is likely to be
389       written back out as XML and the default behaviour of rolling single
390       nested elements up into attributes is not desirable.
391
392       If you are using the array folding feature, you should almost certainly
393       enable this option.  If you do not, single nested elements will not be
394       parsed to arrays and therefore will not be candidates for folding to a
395       hash.  (Given that the default value of 'KeyAttr' enables array
396       folding, the default value of this option should probably also have
397       been enabled too - sorry).
398
399   ForceArray => [ names ] # in - important
400       This alternative (and preferred) form of the 'ForceArray' option allows
401       you to specify a list of element names which should always be forced
402       into an array representation, rather than the 'all or nothing' approach
403       above.
404
405       It is also possible (since version 2.05) to include compiled regular
406       expressions in the list - any element names which match the pattern
407       will be forced to arrays.  If the list contains only a single regex,
408       then it is not necessary to enclose it in an arrayref.  Eg:
409
410         ForceArray => qr/_list$/
411
412   ForceContent => 1 # in - seldom used
413       When "XMLin()" parses elements which have text content as well as
414       attributes, the text content must be represented as a hash value rather
415       than a simple scalar.  This option allows you to force text content to
416       always parse to a hash value even when there are no attributes.  So for
417       example:
418
419         XMLin('<opt><x>text1</x><y a="2">text2</y></opt>', ForceContent => 1)
420
421       will parse to:
422
423         {
424           'x' => {           'content' => 'text1' },
425           'y' => { 'a' => 2, 'content' => 'text2' }
426         }
427
428       instead of:
429
430         {
431           'x' => 'text1',
432           'y' => { 'a' => 2, 'content' => 'text2' }
433         }
434
435   GroupTags => { grouping tag => grouped tag } # in+out - handy
436       You can use this option to eliminate extra levels of indirection in
437       your Perl data structure.  For example this XML:
438
439         <opt>
440          <searchpath>
441            <dir>/usr/bin</dir>
442            <dir>/usr/local/bin</dir>
443            <dir>/usr/X11/bin</dir>
444          </searchpath>
445        </opt>
446
447       Would normally be read into a structure like this:
448
449         {
450           searchpath => {
451                           dir => [ '/usr/bin', '/usr/local/bin', '/usr/X11/bin' ]
452                         }
453         }
454
455       But when read in with the appropriate value for 'GroupTags':
456
457         my $opt = XMLin($xml, GroupTags => { searchpath => 'dir' });
458
459       It will return this simpler structure:
460
461         {
462           searchpath => [ '/usr/bin', '/usr/local/bin', '/usr/X11/bin' ]
463         }
464
465       The grouping element ("<searchpath>" in the example) must not contain
466       any attributes or elements other than the grouped element.
467
468       You can specify multiple 'grouping element' to 'grouped element'
469       mappings in the same hashref.  If this option is combined with
470       "KeyAttr", the array folding will occur first and then the grouped
471       element names will be eliminated.
472
473       "XMLout" will also use the grouptag mappings to re-introduce the tags
474       around the grouped elements.  Beware though that this will occur in all
475       places that the 'grouping tag' name occurs - you probably don't want to
476       use the same name for elements as well as attributes.
477
478   Handler => object_ref # out - SAX only
479       Use the 'Handler' option to have "XMLout()" generate SAX events rather
480       than returning a string of XML.  For more details see "SAX SUPPORT"
481       below.
482
483       Note: the current implementation of this option generates a string of
484       XML and uses a SAX parser to translate it into SAX events.  The normal
485       encoding rules apply here - your data must be UTF8 encoded unless you
486       specify an alternative encoding via the 'XMLDecl' option; and by the
487       time the data reaches the handler object, it will be in UTF8 form
488       regardless of the encoding you supply.  A future implementation of this
489       option may generate the events directly.
490
491   KeepRoot => 1 # in+out - handy
492       In its attempt to return a data structure free of superfluous detail
493       and unnecessary levels of indirection, "XMLin()" normally discards the
494       root element name.  Setting the 'KeepRoot' option to '1' will cause the
495       root element name to be retained.  So after executing this code:
496
497         $config = XMLin('<config tempdir="/tmp" />', KeepRoot => 1)
498
499       You'll be able to reference the tempdir as
500       "$config->{config}->{tempdir}" instead of the default
501       "$config->{tempdir}".
502
503       Similarly, setting the 'KeepRoot' option to '1' will tell "XMLout()"
504       that the data structure already contains a root element name and it is
505       not necessary to add another.
506
507   KeyAttr => [ list ] # in+out - important
508       This option controls the 'array folding' feature which translates
509       nested elements from an array to a hash.  It also controls the
510       'unfolding' of hashes to arrays.
511
512       For example, this XML:
513
514           <opt>
515             <user login="grep" fullname="Gary R Epstein" />
516             <user login="stty" fullname="Simon T Tyson" />
517           </opt>
518
519       would, by default, parse to this:
520
521           {
522             'user' => [
523                         {
524                           'login' => 'grep',
525                           'fullname' => 'Gary R Epstein'
526                         },
527                         {
528                           'login' => 'stty',
529                           'fullname' => 'Simon T Tyson'
530                         }
531                       ]
532           }
533
534       If the option 'KeyAttr => "login"' were used to specify that the
535       'login' attribute is a key, the same XML would parse to:
536
537           {
538             'user' => {
539                         'stty' => {
540                                     'fullname' => 'Simon T Tyson'
541                                   },
542                         'grep' => {
543                                     'fullname' => 'Gary R Epstein'
544                                   }
545                       }
546           }
547
548       The key attribute names should be supplied in an arrayref if there is
549       more than one.  "XMLin()" will attempt to match attribute names in the
550       order supplied.  "XMLout()" will use the first attribute name supplied
551       when 'unfolding' a hash into an array.
552
553       Note 1: The default value for 'KeyAttr' is ['name', 'key', 'id'].  If
554       you do not want folding on input or unfolding on output you must
555       setting this option to an empty list to disable the feature.
556
557       Note 2: If you wish to use this option, you should also enable the
558       "ForceArray" option.  Without 'ForceArray', a single nested element
559       will be rolled up into a scalar rather than an array and therefore will
560       not be folded (since only arrays get folded).
561
562   KeyAttr => { list } # in+out - important
563       This alternative (and preferred) method of specifiying the key
564       attributes allows more fine grained control over which elements are
565       folded and on which attributes.  For example the option 'KeyAttr => {
566       package => 'id' } will cause any package elements to be folded on the
567       'id' attribute.  No other elements which have an 'id' attribute will be
568       folded at all.
569
570       Note: "XMLin()" will generate a warning (or a fatal error in "STRICT
571       MODE") if this syntax is used and an element which does not have the
572       specified key attribute is encountered (eg: a 'package' element without
573       an 'id' attribute, to use the example above).  Warnings will only be
574       generated if -w is in force.
575
576       Two further variations are made possible by prefixing a '+' or a '-'
577       character to the attribute name:
578
579       The option 'KeyAttr => { user => "+login" }' will cause this XML:
580
581           <opt>
582             <user login="grep" fullname="Gary R Epstein" />
583             <user login="stty" fullname="Simon T Tyson" />
584           </opt>
585
586       to parse to this data structure:
587
588           {
589             'user' => {
590                         'stty' => {
591                                     'fullname' => 'Simon T Tyson',
592                                     'login'    => 'stty'
593                                   },
594                         'grep' => {
595                                     'fullname' => 'Gary R Epstein',
596                                     'login'    => 'grep'
597                                   }
598                       }
599           }
600
601       The '+' indicates that the value of the key attribute should be copied
602       rather than moved to the folded hash key.
603
604       A '-' prefix would produce this result:
605
606           {
607             'user' => {
608                         'stty' => {
609                                     'fullname' => 'Simon T Tyson',
610                                     '-login'    => 'stty'
611                                   },
612                         'grep' => {
613                                     'fullname' => 'Gary R Epstein',
614                                     '-login'    => 'grep'
615                                   }
616                       }
617           }
618
619       As described earlier, "XMLout" will ignore hash keys starting with a
620       '-'.
621
622   NoAttr => 1 # in+out - handy
623       When used with "XMLout()", the generated XML will contain no
624       attributes.  All hash key/values will be represented as nested elements
625       instead.
626
627       When used with "XMLin()", any attributes in the XML will be ignored.
628
629   NoEscape => 1 # out - seldom used
630       By default, "XMLout()" will translate the characters '<', '>', '&' and
631       '"' to '&lt;', '&gt;', '&amp;' and '&quot' respectively.  Use this
632       option to suppress escaping (presumably because you've already escaped
633       the data in some more sophisticated manner).
634
635   NoIndent => 1 # out - seldom used
636       Set this option to 1 to disable "XMLout()"'s default 'pretty printing'
637       mode.  With this option enabled, the XML output will all be on one line
638       (unless there are newlines in the data) - this may be easier for
639       downstream processing.
640
641   NoSort => 1 # out - seldom used
642       Newer versions of XML::Simple sort elements and attributes
643       alphabetically (*), by default.  Enable this option to suppress the
644       sorting - possibly for backwards compatibility.
645
646       * Actually, sorting is alphabetical but 'key' attribute or element
647       names (as in 'KeyAttr') sort first.  Also, when a hash of hashes is
648       'unfolded', the elements are sorted alphabetically by the value of the
649       key field.
650
651   NormaliseSpace => 0 | 1 | 2 # in - handy
652       This option controls how whitespace in text content is handled.
653       Recognised values for the option are:
654
655       ·   0 = (default) whitespace is passed through unaltered (except of
656           course for the normalisation of whitespace in attribute values
657           which is mandated by the XML recommendation)
658
659       ·   1 = whitespace is normalised in any value used as a hash key
660           (normalising means removing leading and trailing whitespace and
661           collapsing sequences of whitespace characters to a single space)
662
663       ·   2 = whitespace is normalised in all text content
664
665       Note: you can spell this option with a 'z' if that is more natural for
666       you.
667
668   NSExpand => 1 # in+out handy - SAX only
669       This option controls namespace expansion - the translation of element
670       and attribute names of the form 'prefix:name' to '{uri}name'.  For
671       example the element name 'xsl:template' might be expanded to:
672       '{http://www.w3.org/1999/XSL/Transform}template'.
673
674       By default, "XMLin()" will return element names and attribute names
675       exactly as they appear in the XML.  Setting this option to 1 will cause
676       all element and attribute names to be expanded to include their
677       namespace prefix.
678
679       Note: You must be using a SAX parser for this option to work (ie: it
680       does not work with XML::Parser).
681
682       This option also controls whether "XMLout()" performs the reverse
683       translation from '{uri}name' back to 'prefix:name'.  The default is no
684       translation.  If your data contains expanded names, you should set this
685       option to 1 otherwise "XMLout" will emit XML which is not well formed.
686
687       Note: You must have the XML::NamespaceSupport module installed if you
688       want "XMLout()" to translate URIs back to prefixes.
689
690   NumericEscape => 0 | 1 | 2 # out - handy
691       Use this option to have 'high' (non-ASCII) characters in your Perl data
692       structure converted to numeric entities (eg: &#8364;) in the XML
693       output.  Three levels are possible:
694
695       0 - default: no numeric escaping (OK if you're writing out UTF8)
696
697       1 - only characters above 0xFF are escaped (ie: characters in the
698       0x80-FF range are not escaped), possibly useful with ISO8859-1 output
699
700       2 - all characters above 0x7F are escaped (good for plain ASCII output)
701
702   OutputFile => <file specifier> # out - handy
703       The default behaviour of "XMLout()" is to return the XML as a string.
704       If you wish to write the XML to a file, simply supply the filename
705       using the 'OutputFile' option.
706
707       This option also accepts an IO handle object - especially useful in
708       Perl 5.8.0 and later for output using an encoding other than UTF-8, eg:
709
710         open my $fh, '>:encoding(iso-8859-1)', $path or die "open($path): $!";
711         XMLout($ref, OutputFile => $fh);
712
713       Note, XML::Simple does not require that the object you pass in to the
714       OutputFile option inherits from IO::Handle - it simply assumes the
715       object supports a "print" method.
716
717   ParserOpts => [ XML::Parser Options ] # in - don't use this
718       Note: This option is now officially deprecated.  If you find it useful,
719       email the author with an example of what you use it for.  Do not use
720       this option to set the ProtocolEncoding, that's just plain wrong - fix
721       the XML.
722
723       This option allows you to pass parameters to the constructor of the
724       underlying XML::Parser object (which of course assumes you're not using
725       SAX).
726
727   RootName => 'string' # out - handy
728       By default, when "XMLout()" generates XML, the root element will be
729       named 'opt'.  This option allows you to specify an alternative name.
730
731       Specifying either undef or the empty string for the RootName option
732       will produce XML with no root elements.  In most cases the resulting
733       XML fragment will not be 'well formed' and therefore could not be read
734       back in by "XMLin()".  Nevertheless, the option has been found to be
735       useful in certain circumstances.
736
737   SearchPath => [ list ] # in - handy
738       If you pass "XMLin()" a filename, but the filename include no directory
739       component, you can use this option to specify which directories should
740       be searched to locate the file.  You might use this option to search
741       first in the user's home directory, then in a global directory such as
742       /etc.
743
744       If a filename is provided to "XMLin()" but SearchPath is not defined,
745       the file is assumed to be in the current directory.
746
747       If the first parameter to "XMLin()" is undefined, the default
748       SearchPath will contain only the directory in which the script itself
749       is located.  Otherwise the default SearchPath will be empty.
750
751   SuppressEmpty => 1 | '' | undef # in+out - handy
752       This option controls what "XMLin()" should do with empty elements (no
753       attributes and no content).  The default behaviour is to represent them
754       as empty hashes.  Setting this option to a true value (eg: 1) will
755       cause empty elements to be skipped altogether.  Setting the option to
756       'undef' or the empty string will cause empty elements to be represented
757       as the undefined value or the empty string respectively.  The latter
758       two alternatives are a little easier to test for in your code than a
759       hash with no keys.
760
761       The option also controls what "XMLout()" does with undefined values.
762       Setting the option to undef causes undefined values to be output as
763       empty elements (rather than empty attributes), it also suppresses the
764       generation of warnings about undefined values.  Setting the option to a
765       true value (eg: 1) causes undefined values to be skipped altogether on
766       output.
767
768   ValueAttr => [ names ] # in - handy
769       Use this option to deal elements which always have a single attribute
770       and no content.  Eg:
771
772         <opt>
773           <colour value="red" />
774           <size   value="XXL" />
775         </opt>
776
777       Setting "ValueAttr => [ 'value' ]" will cause the above XML to parse
778       to:
779
780         {
781           colour => 'red',
782           size   => 'XXL'
783         }
784
785       instead of this (the default):
786
787         {
788           colour => { value => 'red' },
789           size   => { value => 'XXL' }
790         }
791
792       Note: This form of the ValueAttr option is not compatible with
793       "XMLout()" - since the attribute name is discarded at parse time, the
794       original XML cannot be reconstructed.
795
796   ValueAttr => { element => attribute, ... } # in+out - handy
797       This (preferred) form of the ValueAttr option requires you to specify
798       both the element and the attribute names.  This is not only safer, it
799       also allows the original XML to be reconstructed by "XMLout()".
800
801       Note: You probably don't want to use this option and the NoAttr option
802       at the same time.
803
804   Variables => { name => value } # in - handy
805       This option allows variables in the XML to be expanded when the file is
806       read.  (there is no facility for putting the variable names back if you
807       regenerate XML using "XMLout").
808
809       A 'variable' is any text of the form "${name}" which occurs in an
810       attribute value or in the text content of an element.  If 'name'
811       matches a key in the supplied hashref, "${name}" will be replaced with
812       the corresponding value from the hashref.  If no matching key is found,
813       the variable will not be replaced.  Names must match the regex:
814       "[\w.]+" (ie: only 'word' characters and dots are allowed).
815
816   VarAttr => 'attr_name' # in - handy
817       In addition to the variables defined using "Variables", this option
818       allows variables to be defined in the XML.  A variable definition
819       consists of an element with an attribute called 'attr_name' (the value
820       of the "VarAttr" option).  The value of the attribute will be used as
821       the variable name and the text content of the element will be used as
822       the value.  A variable defined in this way will override a variable
823       defined using the "Variables" option.  For example:
824
825         XMLin( '<opt>
826                   <dir name="prefix">/usr/local/apache</dir>
827                   <dir name="exec_prefix">${prefix}</dir>
828                   <dir name="bindir">${exec_prefix}/bin</dir>
829                 </opt>',
830                VarAttr => 'name', ContentKey => '-content'
831               );
832
833       produces the following data structure:
834
835         {
836           dir => {
837                    prefix      => '/usr/local/apache',
838                    exec_prefix => '/usr/local/apache',
839                    bindir      => '/usr/local/apache/bin',
840                  }
841         }
842
843   XMLDecl => 1  or  XMLDecl => 'string'  # out - handy
844       If you want the output from "XMLout()" to start with the optional XML
845       declaration, simply set the option to '1'.  The default XML declaration
846       is:
847
848               <?xml version='1.0' standalone='yes'?>
849
850       If you want some other string (for example to declare an encoding
851       value), set the value of this option to the complete string you
852       require.
853

OPTIONAL OO INTERFACE

855       The procedural interface is both simple and convenient however there
856       are a couple of reasons why you might prefer to use the object oriented
857       (OO) interface:
858
859       ·   to define a set of default values which should be used on all
860           subsequent calls to "XMLin()" or "XMLout()"
861
862       ·   to override methods in XML::Simple to provide customised behaviour
863
864       The default values for the options described above are unlikely to suit
865       everyone.  The OO interface allows you to effectively override
866       XML::Simple's defaults with your preferred values.  It works like this:
867
868       First create an XML::Simple parser object with your preferred defaults:
869
870         my $xs = XML::Simple->new(ForceArray => 1, KeepRoot => 1);
871
872       then call "XMLin()" or "XMLout()" as a method of that object:
873
874         my $ref = $xs->XMLin($xml);
875         my $xml = $xs->XMLout($ref);
876
877       You can also specify options when you make the method calls and these
878       values will be merged with the values specified when the object was
879       created.  Values specified in a method call take precedence.
880
881       Note: when called as methods, the "XMLin()" and "XMLout()" routines may
882       be called as "xml_in()" or "xml_out()".  The method names are aliased
883       so the only difference is the aesthetics.
884
885   Parsing Methods
886       You can explicitly call one of the following methods rather than rely
887       on the "xml_in()" method automatically determining whether the target
888       to be parsed is a string, a file or a filehandle:
889
890       parse_string(text)
891           Works exactly like the "xml_in()" method but assumes the first
892           argument is a string of XML (or a reference to a scalar containing
893           a string of XML).
894
895       parse_file(filename)
896           Works exactly like the "xml_in()" method but assumes the first
897           argument is the name of a file containing XML.
898
899       parse_fh(file_handle)
900           Works exactly like the "xml_in()" method but assumes the first
901           argument is a filehandle which can be read to get XML.
902
903   Hook Methods
904       You can make your own class which inherits from XML::Simple and
905       overrides certain behaviours.  The following methods may provide useful
906       'hooks' upon which to hang your modified behaviour.  You may find other
907       undocumented methods by examining the source, but those may be subject
908       to change in future releases.
909
910       handle_options(direction, name => value ...)
911           This method will be called when one of the parsing methods or the
912           "XMLout()" method is called.  The initial argument will be a string
913           (either 'in' or 'out') and the remaining arguments will be name
914           value pairs.
915
916       default_config_file()
917           Calculates and returns the name of the file which should be parsed
918           if no filename is passed to "XMLin()" (default: "$0.xml").
919
920       build_simple_tree(filename, string)
921           Called from "XMLin()" or any of the parsing methods.  Takes either
922           a file name as the first argument or "undef" followed by a 'string'
923           as the second argument.  Returns a simple tree data structure.  You
924           could override this method to apply your own transformations before
925           the data structure is returned to the caller.
926
927       new_hashref()
928           When the 'simple tree' data structure is being built, this method
929           will be called to create any required anonymous hashrefs.
930
931       sorted_keys(name, hashref)
932           Called when "XMLout()" is translating a hashref to XML.  This
933           routine returns a list of hash keys in the order that the
934           corresponding attributes/elements should appear in the output.
935
936       escape_value(string)
937           Called from "XMLout()", takes a string and returns a copy of the
938           string with XML character escaping rules applied.
939
940       numeric_escape(string)
941           Called from "escape_value()", to handle non-ASCII characters
942           (depending on the value of the NumericEscape option).
943
944       copy_hash(hashref, extra_key => value, ...)
945           Called from "XMLout()", when 'unfolding' a hash of hashes into an
946           array of hashes.  You might wish to override this method if you're
947           using tied hashes and don't want them to get untied.
948
949   Cache Methods
950       XML::Simple implements three caching schemes ('storable', 'memshare'
951       and 'memcopy').  You can implement a custom caching scheme by
952       implementing two methods - one for reading from the cache and one for
953       writing to it.
954
955       For example, you might implement a new 'dbm' scheme that stores cached
956       data structures using the MLDBM module.  First, you would add a
957       "cache_read_dbm()" method which accepted a filename for use as a lookup
958       key and returned a data structure on success, or undef on failure.
959       Then, you would implement a "cache_read_dbm()" method which accepted a
960       data structure and a filename.
961
962       You would use this caching scheme by specifying the option:
963
964         Cache => [ 'dbm' ]
965

STRICT MODE

967       If you import the XML::Simple routines like this:
968
969         use XML::Simple qw(:strict);
970
971       the following common mistakes will be detected and treated as fatal
972       errors
973
974       ·   Failing to explicitly set the "KeyAttr" option - if you can't be
975           bothered reading about this option, turn it off with: KeyAttr => [
976           ]
977
978       ·   Failing to explicitly set the "ForceArray" option - if you can't be
979           bothered reading about this option, set it to the safest mode with:
980           ForceArray => 1
981
982       ·   Setting ForceArray to an array, but failing to list all the
983           elements from the KeyAttr hash.
984
985       ·   Data error - KeyAttr is set to say { part => 'partnum' } but the
986           XML contains one or more <part> elements without a 'partnum'
987           attribute (or nested element).  Note: if strict mode is not set but
988           -w is, this condition triggers a warning.
989
990       ·   Data error - as above, but non-unique values are present in the key
991           attribute (eg: more than one <part> element with the same partnum).
992           This will also trigger a warning if strict mode is not enabled.
993
994       ·   Data error - as above, but value of key attribute (eg: partnum) is
995           not a scalar string (due to nested elements etc).  This will also
996           trigger a warning if strict mode is not enabled.
997

SAX SUPPORT

999       From version 1.08_01, XML::Simple includes support for SAX (the Simple
1000       API for XML) - specifically SAX2.
1001
1002       In a typical SAX application, an XML parser (or SAX 'driver') module
1003       generates SAX events (start of element, character data, end of element,
1004       etc) as it parses an XML document and a 'handler' module processes the
1005       events to extract the required data.  This simple model allows for some
1006       interesting and powerful possibilities:
1007
1008       ·   Applications written to the SAX API can extract data from huge XML
1009           documents without the memory overheads of a DOM or tree API.
1010
1011       ·   The SAX API allows for plug and play interchange of parser modules
1012           without having to change your code to fit a new module's API.  A
1013           number of SAX parsers are available with capabilities ranging from
1014           extreme portability to blazing performance.
1015
1016       ·   A SAX 'filter' module can implement both a handler interface for
1017           receiving data and a generator interface for passing modified data
1018           on to a downstream handler.  Filters can be chained together in
1019           'pipelines'.
1020
1021       ·   One filter module might split a data stream to direct data to two
1022           or more downstream handlers.
1023
1024       ·   Generating SAX events is not the exclusive preserve of XML parsing
1025           modules.  For example, a module might extract data from a
1026           relational database using DBI and pass it on to a SAX pipeline for
1027           filtering and formatting.
1028
1029       XML::Simple can operate at either end of a SAX pipeline.  For example,
1030       you can take a data structure in the form of a hashref and pass it into
1031       a SAX pipeline using the 'Handler' option on "XMLout()":
1032
1033         use XML::Simple;
1034         use Some::SAX::Filter;
1035         use XML::SAX::Writer;
1036
1037         my $ref = {
1038                      ....   # your data here
1039                   };
1040
1041         my $writer = XML::SAX::Writer->new();
1042         my $filter = Some::SAX::Filter->new(Handler => $writer);
1043         my $simple = XML::Simple->new(Handler => $filter);
1044         $simple->XMLout($ref);
1045
1046       You can also put XML::Simple at the opposite end of the pipeline to
1047       take advantage of the simple 'tree' data structure once the relevant
1048       data has been isolated through filtering:
1049
1050         use XML::SAX;
1051         use Some::SAX::Filter;
1052         use XML::Simple;
1053
1054         my $simple = XML::Simple->new(ForceArray => 1, KeyAttr => ['partnum']);
1055         my $filter = Some::SAX::Filter->new(Handler => $simple);
1056         my $parser = XML::SAX::ParserFactory->parser(Handler => $filter);
1057
1058         my $ref = $parser->parse_uri('some_huge_file.xml');
1059
1060         print $ref->{part}->{'555-1234'};
1061
1062       You can build a filter by using an XML::Simple object as a handler and
1063       setting its DataHandler option to point to a routine which takes the
1064       resulting tree, modifies it and sends it off as SAX events to a
1065       downstream handler:
1066
1067         my $writer = XML::SAX::Writer->new();
1068         my $filter = XML::Simple->new(
1069                        DataHandler => sub {
1070                                         my $simple = shift;
1071                                         my $data = shift;
1072
1073                                         # Modify $data here
1074
1075                                         $simple->XMLout($data, Handler => $writer);
1076                                       }
1077                      );
1078         my $parser = XML::SAX::ParserFactory->parser(Handler => $filter);
1079
1080         $parser->parse_uri($filename);
1081
1082       Note: In this last example, the 'Handler' option was specified in the
1083       call to "XMLout()" but it could also have been specified in the
1084       constructor.
1085

ENVIRONMENT

1087       If you don't care which parser module XML::Simple uses then skip this
1088       section entirely (it looks more complicated than it really is).
1089
1090       XML::Simple will default to using a SAX parser if one is available or
1091       XML::Parser if SAX is not available.
1092
1093       You can dictate which parser module is used by setting either the
1094       environment variable 'XML_SIMPLE_PREFERRED_PARSER' or the package
1095       variable $XML::Simple::PREFERRED_PARSER to contain the module name.
1096       The following rules are used:
1097
1098       ·   The package variable takes precedence over the environment variable
1099           if both are defined.  To force XML::Simple to ignore the
1100           environment settings and use its default rules, you can set the
1101           package variable to an empty string.
1102
1103       ·   If the 'preferred parser' is set to the string 'XML::Parser', then
1104           XML::Parser will be used (or "XMLin()" will die if XML::Parser is
1105           not installed).
1106
1107       ·   If the 'preferred parser' is set to some other value, then it is
1108           assumed to be the name of a SAX parser module and is passed to
1109           XML::SAX::ParserFactory.  If XML::SAX is not installed, or the
1110           requested parser module is not installed, then "XMLin()" will die.
1111
1112       ·   If the 'preferred parser' is not defined at all (the normal default
1113           state), an attempt will be made to load XML::SAX.  If XML::SAX is
1114           installed, then a parser module will be selected according to
1115           XML::SAX::ParserFactory's normal rules (which typically means the
1116           last SAX parser installed).
1117
1118       ·   if the 'preferred parser' is not defined and XML::SAX is not
1119           installed, then XML::Parser will be used.  "XMLin()" will die if
1120           XML::Parser is not installed.
1121
1122       Note: The XML::SAX distribution includes an XML parser written entirely
1123       in Perl.  It is very portable but it is not very fast.  You should
1124       consider installing XML::LibXML or XML::SAX::Expat if they are
1125       available for your platform.
1126

ERROR HANDLING

1128       The XML standard is very clear on the issue of non-compliant documents.
1129       An error in parsing any single element (for example a missing end tag)
1130       must cause the whole document to be rejected.  XML::Simple will die
1131       with an appropriate message if it encounters a parsing error.
1132
1133       If dying is not appropriate for your application, you should arrange to
1134       call "XMLin()" in an eval block and look for errors in $@.  eg:
1135
1136           my $config = eval { XMLin() };
1137           PopUpMessage($@) if($@);
1138
1139       Note, there is a common misconception that use of eval will
1140       significantly slow down a script.  While that may be true when the code
1141       being eval'd is in a string, it is not true of code like the sample
1142       above.
1143

EXAMPLES

1145       When "XMLin()" reads the following very simple piece of XML:
1146
1147           <opt username="testuser" password="frodo"></opt>
1148
1149       it returns the following data structure:
1150
1151           {
1152             'username' => 'testuser',
1153             'password' => 'frodo'
1154           }
1155
1156       The identical result could have been produced with this alternative
1157       XML:
1158
1159           <opt username="testuser" password="frodo" />
1160
1161       Or this (although see 'ForceArray' option for variations):
1162
1163           <opt>
1164             <username>testuser</username>
1165             <password>frodo</password>
1166           </opt>
1167
1168       Repeated nested elements are represented as anonymous arrays:
1169
1170           <opt>
1171             <person firstname="Joe" lastname="Smith">
1172               <email>joe@smith.com</email>
1173               <email>jsmith@yahoo.com</email>
1174             </person>
1175             <person firstname="Bob" lastname="Smith">
1176               <email>bob@smith.com</email>
1177             </person>
1178           </opt>
1179
1180           {
1181             'person' => [
1182                           {
1183                             'email' => [
1184                                          'joe@smith.com',
1185                                          'jsmith@yahoo.com'
1186                                        ],
1187                             'firstname' => 'Joe',
1188                             'lastname' => 'Smith'
1189                           },
1190                           {
1191                             'email' => 'bob@smith.com',
1192                             'firstname' => 'Bob',
1193                             'lastname' => 'Smith'
1194                           }
1195                         ]
1196           }
1197
1198       Nested elements with a recognised key attribute are transformed
1199       (folded) from an array into a hash keyed on the value of that attribute
1200       (see the "KeyAttr" option):
1201
1202           <opt>
1203             <person key="jsmith" firstname="Joe" lastname="Smith" />
1204             <person key="tsmith" firstname="Tom" lastname="Smith" />
1205             <person key="jbloggs" firstname="Joe" lastname="Bloggs" />
1206           </opt>
1207
1208           {
1209             'person' => {
1210                           'jbloggs' => {
1211                                          'firstname' => 'Joe',
1212                                          'lastname' => 'Bloggs'
1213                                        },
1214                           'tsmith' => {
1215                                         'firstname' => 'Tom',
1216                                         'lastname' => 'Smith'
1217                                       },
1218                           'jsmith' => {
1219                                         'firstname' => 'Joe',
1220                                         'lastname' => 'Smith'
1221                                       }
1222                         }
1223           }
1224
1225       The <anon> tag can be used to form anonymous arrays:
1226
1227           <opt>
1228             <head><anon>Col 1</anon><anon>Col 2</anon><anon>Col 3</anon></head>
1229             <data><anon>R1C1</anon><anon>R1C2</anon><anon>R1C3</anon></data>
1230             <data><anon>R2C1</anon><anon>R2C2</anon><anon>R2C3</anon></data>
1231             <data><anon>R3C1</anon><anon>R3C2</anon><anon>R3C3</anon></data>
1232           </opt>
1233
1234           {
1235             'head' => [
1236                         [ 'Col 1', 'Col 2', 'Col 3' ]
1237                       ],
1238             'data' => [
1239                         [ 'R1C1', 'R1C2', 'R1C3' ],
1240                         [ 'R2C1', 'R2C2', 'R2C3' ],
1241                         [ 'R3C1', 'R3C2', 'R3C3' ]
1242                       ]
1243           }
1244
1245       Anonymous arrays can be nested to arbirtrary levels and as a special
1246       case, if the surrounding tags for an XML document contain only an
1247       anonymous array the arrayref will be returned directly rather than the
1248       usual hashref:
1249
1250           <opt>
1251             <anon><anon>Col 1</anon><anon>Col 2</anon></anon>
1252             <anon><anon>R1C1</anon><anon>R1C2</anon></anon>
1253             <anon><anon>R2C1</anon><anon>R2C2</anon></anon>
1254           </opt>
1255
1256           [
1257             [ 'Col 1', 'Col 2' ],
1258             [ 'R1C1', 'R1C2' ],
1259             [ 'R2C1', 'R2C2' ]
1260           ]
1261
1262       Elements which only contain text content will simply be represented as
1263       a scalar.  Where an element has both attributes and text content, the
1264       element will be represented as a hashref with the text content in the
1265       'content' key (see the "ContentKey" option):
1266
1267         <opt>
1268           <one>first</one>
1269           <two attr="value">second</two>
1270         </opt>
1271
1272         {
1273           'one' => 'first',
1274           'two' => { 'attr' => 'value', 'content' => 'second' }
1275         }
1276
1277       Mixed content (elements which contain both text content and nested
1278       elements) will be not be represented in a useful way - element order
1279       and significant whitespace will be lost.  If you need to work with
1280       mixed content, then XML::Simple is not the right tool for your job -
1281       check out the next section.
1282

WHERE TO FROM HERE?

1284       XML::Simple is able to present a simple API because it makes some
1285       assumptions on your behalf.  These include:
1286
1287       ·   You're not interested in text content consisting only of whitespace
1288
1289       ·   You don't mind that when things get slurped into a hash the order
1290           is lost
1291
1292       ·   You don't want fine-grained control of the formatting of generated
1293           XML
1294
1295       ·   You would never use a hash key that was not a legal XML element
1296           name
1297
1298       ·   You don't need help converting between different encodings
1299
1300       In a serious XML project, you'll probably outgrow these assumptions
1301       fairly quickly.  This section of the document used to offer some advice
1302       on chosing a more powerful option.  That advice has now grown into the
1303       'Perl-XML FAQ' document which you can find at:
1304       <http://perl-xml.sourceforge.net/faq/>
1305
1306       The advice in the FAQ boils down to a quick explanation of tree versus
1307       event based parsers and then recommends:
1308
1309       For event based parsing, use SAX (do not set out to write any new code
1310       for XML::Parser's handler API - it is obselete).
1311
1312       For tree-based parsing, you could choose between the 'Perlish' approach
1313       of XML::Twig and more standards based DOM implementations - preferably
1314       one with XPath support.
1315

SEE ALSO

1317       XML::Simple requires either XML::Parser or XML::SAX.
1318
1319       To generate documents with namespaces, XML::NamespaceSupport is
1320       required.
1321
1322       The optional caching functions require Storable.
1323
1324       Answers to Frequently Asked Questions about XML::Simple are bundled
1325       with this distribution as: XML::Simple::FAQ
1326
1328       Copyright 1999-2004 Grant McLean <grantm@cpan.org>
1329
1330       This library is free software; you can redistribute it and/or modify it
1331       under the same terms as Perl itself.
1332
1333
1334
1335perl v5.10.1                      2007-08-15                    XML::Simple(3)
Impressum