1XML::Simple::FAQ(3)   User Contributed Perl Documentation  XML::Simple::FAQ(3)
2
3
4

Frequently Asked Questions about XML::Simple

Basics

7   What is XML::Simple designed to be used for?
8       XML::Simple is a Perl module that was originally developed as a tool
9       for reading and writing configuration data in XML format.  You can use
10       it for many other purposes that involve storing and retrieving
11       structured data in XML.
12
13       You might also find XML::Simple a good starting point for playing with
14       XML from Perl.  It doesn't have a steep learning curve and if you
15       outgrow its capabilities there are plenty of other Perl/XML modules to
16       'step up' to.
17
18   Why store configuration data in XML anyway?
19       The many advantages of using XML format for configuration data include:
20
21       ·   Using existing XML parsing tools requires less development time, is
22           easier and more robust than developing your own config file parsing
23           code
24
25       ·   XML can represent relationships between pieces of data, such as
26           nesting of sections to arbitrary levels (not easily done with .INI
27           files for example)
28
29       ·   XML is basically just text, so you can easily edit a config file
30           (easier than editing a Win32 registry)
31
32       ·   XML provides standard solutions for handling character sets and
33           encoding beyond basic ASCII (important for internationalization)
34
35       ·   If it becomes necessary to change your configuration file format,
36           there are many tools available for performing transformations on
37           XML files
38
39       ·   XML is an open standard (the world does not need more proprietary
40           binary file formats)
41
42       ·   Taking the extra step of developing a DTD allows the format of
43           configuration files to be validated before your program reads them
44           (not directly supported by XML::Simple)
45
46       ·   Combining a DTD with a good XML editor can give you a GUI config
47           editor for minimal coding effort
48
49   What isn't XML::Simple good for?
50       The main limitation of XML::Simple is that it does not work with 'mixed
51       content' (see the next question).  If you consider your XML files
52       contain marked up text rather than structured data, you should probably
53       use another module.
54
55       If you are working with very large XML files, XML::Simple's approach of
56       representing the whole file in memory as a 'tree' data structure may
57       not be suitable.
58
59   What is mixed content?
60       Consider this example XML:
61
62         <document>
63           <para>This is <em>mixed</em> content.</para>
64         </document>
65
66       This is said to be mixed content, because the <para> element contains
67       both character data (text content) and nested elements.
68
69       Here's some more XML:
70
71         <person>
72           <first_name>Joe</first_name>
73           <last_name>Bloggs</last_name>
74           <dob>25-April-1969</dob>
75         </person>
76
77       This second example is not generally considered to be mixed content.
78       The <first_name>, <last_name> and <dob> elements contain only character
79       data and the  <person> element contains only nested elements.  (Note:
80       Strictly speaking, the whitespace between the nested elements is
81       character data, but it is ignored by XML::Simple).
82
83   Why doesn't XML::Simple handle mixed content?
84       Because if it did, it would no longer be simple :-)
85
86       Seriously though, there are plenty of excellent modules that allow you
87       to work with mixed content in a variety of ways.  Handling mixed
88       content correctly is not easy and by ignoring these issues, XML::Simple
89       is able to present an API without a steep learning curve.
90
91   Which Perl modules do handle mixed content?
92       Every one of them except XML::Simple :-)
93
94       If you're looking for a recommendation, I'd suggest you look at the
95       Perl-XML FAQ at:
96
97         http://perl-xml.sourceforge.net/faq/
98

Installation

100   How do I install XML::Simple?
101       If you're running ActiveState Perl, you've probably already got
102       XML::Simple (although you may want to upgrade to version 1.09 or better
103       for SAX support).
104
105       If you do need to install XML::Simple, you'll need to install an XML
106       parser module first.  Install either XML::Parser (which you may have
107       already) or XML::SAX.  If you install both, XML::SAX will be used by
108       default.
109
110       Once you have a parser installed ...
111
112       On Unix systems, try:
113
114         perl -MCPAN -e 'install XML::Simple'
115
116       If that doesn't work, download the latest distribution from
117       ftp://ftp.cpan.org/pub/CPAN/authors/id/G/GR/GRANTM , unpack it and run
118       these commands:
119
120         perl Makefile.PL
121         make
122         make test
123         make install
124
125       On Win32, if you have a recent build of ActiveState Perl (618 or
126       better) try this command:
127
128         ppm install XML::Simple
129
130       If that doesn't work, you really only need the Simple.pm file, so
131       extract it from the .tar.gz file (eg: using WinZIP) and save it in the
132       \site\lib\XML directory under your Perl installation (typically
133       C:\Perl).
134
135   I'm trying to install XML::Simple and 'make test' fails
136       Is the directory where you've unpacked XML::Simple mounted from a file
137       server using NFS, SMB or some other network file sharing?  If so, that
138       may cause errors in the the following test scripts:
139
140         3_Storable.t
141         4_MemShare.t
142         5_MemCopy.t
143
144       The test suite is designed to exercise the boundary conditions of all
145       XML::Simple's functionality and these three scripts exercise the
146       caching functions.  If XML::Simple is asked to parse a file for which
147       it has a cached copy of a previous parse, then it compares the
148       timestamp on the XML file with the timestamp on the cached copy.  If
149       the cached copy is *newer* then it will be used.  If the cached copy is
150       older or the same age then the file is re-parsed.  The test scripts
151       will get confused by networked filesystems if the workstation and
152       server system clocks are not synchronised (to the second).
153
154       If you get an error in one of these three test scripts but you don't
155       plan to use the caching options (they're not enabled by default), then
156       go right ahead and run 'make install'.  If you do plan to use caching,
157       then try unpacking the distribution on local disk and doing the
158       build/test there.
159
160       It's probably not a good idea to use the caching options with networked
161       filesystems in production.  If the file server's clock is ahead of the
162       local clock, XML::Simple will re-parse files when it could have used
163       the cached copy.  However if the local clock is ahead of the file
164       server clock and a file is changed immediately after it is cached, the
165       old cached copy will be used.
166
167       Is one of the three test scripts (above) failing but you're not running
168       on a network filesystem?  Are you running Win32?  If so, you may be
169       seeing a bug in Win32 where writes to a file do not affect its
170       modfication timestamp.
171
172       If none of these scenarios match your situation, please confirm you're
173       running the latest version of XML::Simple and then email the output of
174       'make test' to me at grantm@cpan.org
175
176   Why is XML::Simple so slow?
177       If you find that XML::Simple is very slow reading XML, the most likely
178       reason is that you have XML::SAX installed but no additional SAX parser
179       module.  The XML::SAX distribution includes an XML parser written
180       entirely in Perl.  This is very portable but not very fast.  For better
181       performance install either XML::SAX::Expat or XML::LibXML.
182

Usage

184   How do I use XML::Simple?
185       If you had an XML document called /etc/appconfig/foo.xml you could
186       'slurp' it into a simple data structure (typically a hashref) with
187       these lines of code:
188
189         use XML::Simple;
190
191         my $config = XMLin('/etc/appconfig/foo.xml');
192
193       The XMLin() function accepts options after the filename.
194
195   There are so many options, which ones do I really need to know about?
196       Although you can get by without using any options, you shouldn't even
197       consider using XML::Simple in production until you know what these two
198       options do:
199
200       ·   forcearray
201
202       ·   keyattr
203
204       The reason you really need to read about them is because the default
205       values for these options will trip you up if you don't.  Although
206       everyone agrees that these defaults are not ideal, there is not wide
207       agreement on what they should be changed to.  The answer therefore is
208       to read about them (see below) and select values which are right for
209       you.
210
211   What is the forcearray option all about?
212       Consider this XML in a file called ./person.xml:
213
214         <person>
215           <first_name>Joe</first_name>
216           <last_name>Bloggs</last_name>
217           <hobbie>bungy jumping</hobbie>
218           <hobbie>sky diving</hobbie>
219           <hobbie>knitting</hobbie>
220         </person>
221
222       You could read it in with this line:
223
224         my $person = XMLin('./person.xml');
225
226       Which would give you a data structure like this:
227
228         $person = {
229           'first_name' => 'Joe',
230           'last_name'  => 'Bloggs',
231           'hobbie'     => [ 'bungy jumping', 'sky diving', 'knitting' ]
232         };
233
234       The <first_name> and <last_name> elements are represented as simple
235       scalar values which you could refer to like this:
236
237         print "$person->{first_name} $person->{last_name}\n";
238
239       The <hobbie> elements are represented as an array - since there is more
240       than one.  You could refer to the first one like this:
241
242         print $person->{hobbie}->[0], "\n";
243
244       Or the whole lot like this:
245
246         print join(', ', @{$person->{hobbie}} ), "\n";
247
248       The catch is, that these last two lines of code will only work for
249       people who have more than one hobbie.  If there is only one <hobbie>
250       element, it will be represented as a simple scalar (just like
251       <first_name> and <last_name>).  Which might lead you to write code like
252       this:
253
254         if(ref($person->{hobbie})) {
255           print join(', ', @{$person->{hobbie}} ), "\n";
256         }
257         else {
258           print $person->{hobbie}, "\n";
259         }
260
261       Don't do that.
262
263       One alternative approach is to set the forcearray option to a true
264       value:
265
266         my $person = XMLin('./person.xml', forcearray => 1);
267
268       Which will give you a data structure like this:
269
270         $person = {
271           'first_name' => [ 'Joe' ],
272           'last_name'  => [ 'Bloggs' ],
273           'hobbie'     => [ 'bungy jumping', 'sky diving', 'knitting' ]
274         };
275
276       Then you can use this line to refer to all the list of hobbies even if
277       there was only one:
278
279         print join(', ', @{$person->{hobbie}} ), "\n";
280
281       The downside of this approach is that the <first_name> and <last_name>
282       elements will also always be represented as arrays even though there
283       will never be more than one:
284
285         print "$person->{first_name}->[0] $person->{last_name}->[0]\n";
286
287       This might be OK if you change the XML to use attributes for things
288       that will always be singular and nested elements for things that may be
289       plural:
290
291         <person first_name="Jane" last_name="Bloggs">
292           <hobbie>motorcycle maintenance</hobbie>
293         </person>
294
295       On the other hand, if you prefer not to use attributes, then you could
296       specify that any <hobbie> elements should always be represented as
297       arrays and all other nested elements should be simple scalar values
298       unless there is more than one:
299
300         my $person = XMLin('./person.xml', forcearray => [ 'hobbie' ]);
301
302       The forcearray option accepts a list of element names which should
303       always be forced to an array representation:
304
305         forcearray => [ qw(hobbie qualification childs_name) ]
306
307       See the XML::Simple manual page for more information.
308
309   What is the keyattr option all about?
310       Consider this sample XML:
311
312         <catalog>
313           <part partnum="1842334" desc="High pressure flange" price="24.50" />
314           <part partnum="9344675" desc="Threaded gasket"      price="9.25" />
315           <part partnum="5634896" desc="Low voltage washer"   price="12.00" />
316         </catalog>
317
318       You could slurp it in with this code:
319
320         my $catalog = XMLin('./catalog.xml');
321
322       Which would return a data structure like this:
323
324         $catalog = {
325             'part' => [
326                 {
327                   'partnum' => '1842334',
328                   'desc'    => 'High pressure flange',
329                   'price'   => '24.50'
330                 },
331                 {
332                   'partnum' => '9344675',
333                   'desc'    => 'Threaded gasket',
334                   'price'   => '9.25'
335                 },
336                 {
337                   'partnum' => '5634896',
338                   'desc'    => 'Low voltage washer',
339                   'price'   => '12.00'
340                 }
341             ]
342         };
343
344       Then you could access the description of the first part in the catalog
345       with this code:
346
347         print $catalog->{part}->[0]->{desc}, "\n";
348
349       However, if you wanted to access the description of the part with the
350       part number of "9344675" then you'd have to code a loop like this:
351
352         foreach my $part (@{$catalog->{part}}) {
353           if($part->{partnum} eq '9344675') {
354             print $part->{desc}, "\n";
355             last;
356           }
357         }
358
359       The knowledge that each <part> element has a unique partnum attribute
360       allows you to eliminate this search.  You can pass this knowledge on to
361       XML::Simple like this:
362
363         my $catalog = XMLin($xml, keyattr => ['partnum']);
364
365       Which will return a data structure like this:
366
367         $catalog = {
368           'part' => {
369             '5634896' => { 'desc' => 'Low voltage washer',   'price' => '12.00' },
370             '1842334' => { 'desc' => 'High pressure flange', 'price' => '24.50' },
371             '9344675' => { 'desc' => 'Threaded gasket',      'price' => '9.25'  }
372           }
373         };
374
375       XML::Simple has been able to transform $catalog->{part} from an
376       arrayref to a hashref (keyed on partnum).  This transformation is
377       called 'array folding'.
378
379       Through the use of array folding, you can now index directly to the
380       description of the part you want:
381
382         print $catalog->{part}->{9344675}->{desc}, "\n";
383
384       The 'keyattr' option also enables array folding when the unique key is
385       in a nested element rather than an attribute.  eg:
386
387         <catalog>
388           <part>
389             <partnum>1842334</partnum>
390             <desc>High pressure flange</desc>
391             <price>24.50</price>
392           </part>
393           <part>
394             <partnum>9344675</partnum>
395             <desc>Threaded gasket</desc>
396             <price>9.25</price>
397           </part>
398           <part>
399             <partnum>5634896</partnum>
400             <desc>Low voltage washer</desc>
401             <price>12.00</price>
402           </part>
403         </catalog>
404
405       See the XML::Simple manual page for more information.
406
407   So what's the catch with 'keyattr'?
408       One thing to watch out for is that you might get array folding even if
409       you don't supply the keyattr option.  The default value for this option
410       is:
411
412         [ 'name', 'key', 'id']
413
414       Which means if your XML elements have a 'name', 'key' or 'id' attribute
415       (or nested element) then they may get folded on those values.  This
416       means that you can take advantage of array folding simply through
417       careful choice of attribute names.  On the hand, if you really don't
418       want array folding at all, you'll need to set 'key attr to an empty
419       list:
420
421         my $ref = XMLin($xml, keyattr => []);
422
423       A second 'gotcha' is that array folding only works on arrays.  That
424       might seem obvious, but if there's only one record in your XML and you
425       didn't set the 'forcearray' option then it won't be represented as an
426       array and consequently won't get folded into a hash.  The moral is that
427       if you're using array folding, you should always turn on the forcearray
428       option.
429
430       You probably want to be as specific as you can be too.  For instance,
431       the safest way to parse the <catalog> example above would be:
432
433         my $catalog = XMLin($xml, keyattr => { part => 'partnum'},
434                                   forcearray => ['part']);
435
436       By using the hashref for keyattr, you can specify that only <part>
437       elements should be folded on the 'partnum' attribute (and that the
438       <part> elements should not be folded on any other attribute).
439
440       By supplying a list of element names for forcearray, you're ensuring
441       that folding will work even if there's only one <part>.  You're also
442       ensuring that if the 'partnum' unique key is supplied in a nested
443       element then that element won't get forced to an array too.
444
445   How do I know what my data structure should look like?
446       The rules are fairly straightforward:
447
448       ·   each element gets represented as a hash
449
450       ·   unless it contains only text, in which case it'll be a simple
451           scalar value
452
453       ·   or unless there's more than one element with the same name, in
454           which case they'll be represented as an array
455
456       ·   unless you've got array folding enabled, in which case they'll be
457           folded into a hash
458
459       ·   empty elements (no text contents and no attributes) will either be
460           represented as an empty hash, an empty string or undef - depending
461           on the value of the 'suppressempty' option.
462
463       If you're in any doubt, use Data::Dumper, eg:
464
465         use XML::Simple;
466         use Data::Dumper;
467
468         my $ref = XMLin($xml);
469
470         print Dumper($ref);
471
472   I'm getting 'Use of uninitialized value' warnings
473       You're probably trying to index into a non-existant hash key - try
474       Data::Dumper.
475
476   I'm getting a 'Not an ARRAY reference' error
477       Something that you expect to be an array is not.  The two most likely
478       causes are that you forgot to use 'forcearray' or that the array got
479       folded into a hash - try Data::Dumper.
480
481   I'm getting a 'No such array field' error
482       Something that you expect to be a hash is actually an array.  Perhaps
483       array folding failed because one element was missing the key attribute
484       - try Data::Dumper.
485
486   I'm getting an 'Out of memory' error
487       Something in the data structure is not as you expect and Perl may be
488       trying unsuccessfully to autovivify things - try Data::Dumper.
489
490       If you're already using Data::Dumper, try calling Dumper() immediately
491       after XMLin() - ie: before you attempt to access anything in the data
492       structure.
493
494   My element order is getting jumbled up
495       If you read an XML file with XMLin() and then write it back out with
496       XMLout(), the order of the elements will likely be different.
497       (However, if you read the file back in with XMLin() you'll get the same
498       Perl data structure).
499
500       The reordering happens because XML::Simple uses hashrefs to store your
501       data and Perl hashes do not really have any order.
502
503       It is possible that a future version of XML::Simple will use
504       Tie::IxHash to store the data in hashrefs which do retain the order.
505       However this will not fix all cases of element order being lost.
506
507       If your application really is sensitive to element order, don't use
508       XML::Simple (and don't put order-sensitive values in attributes).
509
510   XML::Simple turns nested elements into attributes
511       If you read an XML file with XMLin() and then write it back out with
512       XMLout(), some data which was originally stored in nested elements may
513       end up in attributes.  (However, if you read the file back in with
514       XMLin() you'll get the same Perl data structure).
515
516       There are a number of ways you might handle this:
517
518       ·   use the 'forcearray' option with XMLin()
519
520       ·   use the 'noattr' option with XMLout()
521
522       ·   live with it
523
524       ·   don't use XML::Simple
525
526   Why does XMLout() insert <name> elements (or attributes)?
527       Try setting keyattr => [].
528
529       When you call XMLin() to read XML, the 'keyattr' option controls
530       whether arrays get 'folded' into hashes.  Similarly, when you call
531       XMLout(), the 'keyattr' option controls whether hashes get 'unfolded'
532       into arrays.  As described above, 'keyattr' is enabled by default.
533
534   Why are empty elements represented as empty hashes?
535       An element is always represented as a hash unless it contains only
536       text, in which case it is represented as a scalar string.
537
538       If you would prefer empty elements to be represented as empty strings
539       or the undefined value, set the 'suppressempty' option to '' or undef
540       respectively.
541
542   Why is ParserOpts deprecated?
543       The "ParserOpts" option is a remnant of the time when XML::Simple only
544       worked with the XML::Parser API.  Its value is completely ignored if
545       you're using a SAX parser, so writing code which relied on it would bar
546       you from taking advantage of SAX.
547
548       Even if you are using XML::Parser, it is seldom necessary to pass
549       options to the parser object.  A number of people have written to say
550       they use this option to set XML::Parser's "ProtocolEncoding" option.
551       Don't do that, it's wrong, Wrong, WRONG!  Fix the XML document so that
552       it's well-formed and you won't have a problem.
553
554       Having said all of that, as long as XML::Simple continues to support
555       the XML::Parser API, this option will not be removed.  There are
556       currently no plans to remove support for the XML::Parser API.
557
558
559
560perl v5.10.1                      2004-11-20               XML::Simple::FAQ(3)
Impressum