1XML::Simple::FAQ(3) User Contributed Perl Documentation XML::Simple::FAQ(3)
2
3
4
7 What is XML::Simple designed to be used for?
8 XML::Simple is a Perl module that was originally developed as a tool
9 for reading and writing configuration data in XML format. You can use
10 it for many other purposes that involve storing and retrieving
11 structured data in XML.
12
13 You might also find XML::Simple a good starting point for playing with
14 XML from Perl. It doesn't have a steep learning curve and if you
15 outgrow its capabilities there are plenty of other Perl/XML modules to
16 'step up' to.
17
18 Why store configuration data in XML anyway?
19 The many advantages of using XML format for configuration data include:
20
21 · Using existing XML parsing tools requires less development time, is
22 easier and more robust than developing your own config file parsing
23 code
24
25 · XML can represent relationships between pieces of data, such as
26 nesting of sections to arbitrary levels (not easily done with .INI
27 files for example)
28
29 · XML is basically just text, so you can easily edit a config file
30 (easier than editing a Win32 registry)
31
32 · XML provides standard solutions for handling character sets and
33 encoding beyond basic ASCII (important for internationalization)
34
35 · If it becomes necessary to change your configuration file format,
36 there are many tools available for performing transformations on
37 XML files
38
39 · XML is an open standard (the world does not need more proprietary
40 binary file formats)
41
42 · Taking the extra step of developing a DTD allows the format of
43 configuration files to be validated before your program reads them
44 (not directly supported by XML::Simple)
45
46 · Combining a DTD with a good XML editor can give you a GUI config
47 editor for minimal coding effort
48
49 What isn't XML::Simple good for?
50 The main limitation of XML::Simple is that it does not work with 'mixed
51 content' (see the next question). If you consider your XML files
52 contain marked up text rather than structured data, you should probably
53 use another module.
54
55 If you are working with very large XML files, XML::Simple's approach of
56 representing the whole file in memory as a 'tree' data structure may
57 not be suitable.
58
59 What is mixed content?
60 Consider this example XML:
61
62 <document>
63 <para>This is <em>mixed</em> content.</para>
64 </document>
65
66 This is said to be mixed content, because the <para> element contains
67 both character data (text content) and nested elements.
68
69 Here's some more XML:
70
71 <person>
72 <first_name>Joe</first_name>
73 <last_name>Bloggs</last_name>
74 <dob>25-April-1969</dob>
75 </person>
76
77 This second example is not generally considered to be mixed content.
78 The <first_name>, <last_name> and <dob> elements contain only character
79 data and the <person> element contains only nested elements. (Note:
80 Strictly speaking, the whitespace between the nested elements is
81 character data, but it is ignored by XML::Simple).
82
83 Why doesn't XML::Simple handle mixed content?
84 Because if it did, it would no longer be simple :-)
85
86 Seriously though, there are plenty of excellent modules that allow you
87 to work with mixed content in a variety of ways. Handling mixed
88 content correctly is not easy and by ignoring these issues, XML::Simple
89 is able to present an API without a steep learning curve.
90
91 Which Perl modules do handle mixed content?
92 Every one of them except XML::Simple :-)
93
94 If you're looking for a recommendation, I'd suggest you look at the
95 Perl-XML FAQ at:
96
97 http://perl-xml.sourceforge.net/faq/
98
100 How do I install XML::Simple?
101 If you're running ActiveState Perl, you've probably already got
102 XML::Simple (although you may want to upgrade to version 1.09 or better
103 for SAX support).
104
105 If you do need to install XML::Simple, you'll need to install an XML
106 parser module first. Install either XML::Parser (which you may have
107 already) or XML::SAX. If you install both, XML::SAX will be used by
108 default.
109
110 Once you have a parser installed ...
111
112 On Unix systems, try:
113
114 perl -MCPAN -e 'install XML::Simple'
115
116 If that doesn't work, download the latest distribution from
117 ftp://ftp.cpan.org/pub/CPAN/authors/id/G/GR/GRANTM , unpack it and run
118 these commands:
119
120 perl Makefile.PL
121 make
122 make test
123 make install
124
125 On Win32, if you have a recent build of ActiveState Perl (618 or
126 better) try this command:
127
128 ppm install XML::Simple
129
130 If that doesn't work, you really only need the Simple.pm file, so
131 extract it from the .tar.gz file (eg: using WinZIP) and save it in the
132 \site\lib\XML directory under your Perl installation (typically
133 C:\Perl).
134
135 I'm trying to install XML::Simple and 'make test' fails
136 Is the directory where you've unpacked XML::Simple mounted from a file
137 server using NFS, SMB or some other network file sharing? If so, that
138 may cause errors in the the following test scripts:
139
140 3_Storable.t
141 4_MemShare.t
142 5_MemCopy.t
143
144 The test suite is designed to exercise the boundary conditions of all
145 XML::Simple's functionality and these three scripts exercise the
146 caching functions. If XML::Simple is asked to parse a file for which
147 it has a cached copy of a previous parse, then it compares the
148 timestamp on the XML file with the timestamp on the cached copy. If
149 the cached copy is *newer* then it will be used. If the cached copy is
150 older or the same age then the file is re-parsed. The test scripts
151 will get confused by networked filesystems if the workstation and
152 server system clocks are not synchronised (to the second).
153
154 If you get an error in one of these three test scripts but you don't
155 plan to use the caching options (they're not enabled by default), then
156 go right ahead and run 'make install'. If you do plan to use caching,
157 then try unpacking the distribution on local disk and doing the
158 build/test there.
159
160 It's probably not a good idea to use the caching options with networked
161 filesystems in production. If the file server's clock is ahead of the
162 local clock, XML::Simple will re-parse files when it could have used
163 the cached copy. However if the local clock is ahead of the file
164 server clock and a file is changed immediately after it is cached, the
165 old cached copy will be used.
166
167 Is one of the three test scripts (above) failing but you're not running
168 on a network filesystem? Are you running Win32? If so, you may be
169 seeing a bug in Win32 where writes to a file do not affect its
170 modfication timestamp.
171
172 If none of these scenarios match your situation, please confirm you're
173 running the latest version of XML::Simple and then email the output of
174 'make test' to me at grantm@cpan.org
175
176 Why is XML::Simple so slow?
177 If you find that XML::Simple is very slow reading XML, the most likely
178 reason is that you have XML::SAX installed but no additional SAX parser
179 module. The XML::SAX distribution includes an XML parser written
180 entirely in Perl. This is very portable but not very fast. For better
181 performance install either XML::SAX::Expat or XML::LibXML.
182
184 How do I use XML::Simple?
185 If you had an XML document called /etc/appconfig/foo.xml you could
186 'slurp' it into a simple data structure (typically a hashref) with
187 these lines of code:
188
189 use XML::Simple;
190
191 my $config = XMLin('/etc/appconfig/foo.xml');
192
193 The XMLin() function accepts options after the filename.
194
195 There are so many options, which ones do I really need to know about?
196 Although you can get by without using any options, you shouldn't even
197 consider using XML::Simple in production until you know what these two
198 options do:
199
200 · forcearray
201
202 · keyattr
203
204 The reason you really need to read about them is because the default
205 values for these options will trip you up if you don't. Although
206 everyone agrees that these defaults are not ideal, there is not wide
207 agreement on what they should be changed to. The answer therefore is
208 to read about them (see below) and select values which are right for
209 you.
210
211 What is the forcearray option all about?
212 Consider this XML in a file called ./person.xml:
213
214 <person>
215 <first_name>Joe</first_name>
216 <last_name>Bloggs</last_name>
217 <hobbie>bungy jumping</hobbie>
218 <hobbie>sky diving</hobbie>
219 <hobbie>knitting</hobbie>
220 </person>
221
222 You could read it in with this line:
223
224 my $person = XMLin('./person.xml');
225
226 Which would give you a data structure like this:
227
228 $person = {
229 'first_name' => 'Joe',
230 'last_name' => 'Bloggs',
231 'hobbie' => [ 'bungy jumping', 'sky diving', 'knitting' ]
232 };
233
234 The <first_name> and <last_name> elements are represented as simple
235 scalar values which you could refer to like this:
236
237 print "$person->{first_name} $person->{last_name}\n";
238
239 The <hobbie> elements are represented as an array - since there is more
240 than one. You could refer to the first one like this:
241
242 print $person->{hobbie}->[0], "\n";
243
244 Or the whole lot like this:
245
246 print join(', ', @{$person->{hobbie}} ), "\n";
247
248 The catch is, that these last two lines of code will only work for
249 people who have more than one hobbie. If there is only one <hobbie>
250 element, it will be represented as a simple scalar (just like
251 <first_name> and <last_name>). Which might lead you to write code like
252 this:
253
254 if(ref($person->{hobbie})) {
255 print join(', ', @{$person->{hobbie}} ), "\n";
256 }
257 else {
258 print $person->{hobbie}, "\n";
259 }
260
261 Don't do that.
262
263 One alternative approach is to set the forcearray option to a true
264 value:
265
266 my $person = XMLin('./person.xml', forcearray => 1);
267
268 Which will give you a data structure like this:
269
270 $person = {
271 'first_name' => [ 'Joe' ],
272 'last_name' => [ 'Bloggs' ],
273 'hobbie' => [ 'bungy jumping', 'sky diving', 'knitting' ]
274 };
275
276 Then you can use this line to refer to all the list of hobbies even if
277 there was only one:
278
279 print join(', ', @{$person->{hobbie}} ), "\n";
280
281 The downside of this approach is that the <first_name> and <last_name>
282 elements will also always be represented as arrays even though there
283 will never be more than one:
284
285 print "$person->{first_name}->[0] $person->{last_name}->[0]\n";
286
287 This might be OK if you change the XML to use attributes for things
288 that will always be singular and nested elements for things that may be
289 plural:
290
291 <person first_name="Jane" last_name="Bloggs">
292 <hobbie>motorcycle maintenance</hobbie>
293 </person>
294
295 On the other hand, if you prefer not to use attributes, then you could
296 specify that any <hobbie> elements should always be represented as
297 arrays and all other nested elements should be simple scalar values
298 unless there is more than one:
299
300 my $person = XMLin('./person.xml', forcearray => [ 'hobbie' ]);
301
302 The forcearray option accepts a list of element names which should
303 always be forced to an array representation:
304
305 forcearray => [ qw(hobbie qualification childs_name) ]
306
307 See the XML::Simple manual page for more information.
308
309 What is the keyattr option all about?
310 Consider this sample XML:
311
312 <catalog>
313 <part partnum="1842334" desc="High pressure flange" price="24.50" />
314 <part partnum="9344675" desc="Threaded gasket" price="9.25" />
315 <part partnum="5634896" desc="Low voltage washer" price="12.00" />
316 </catalog>
317
318 You could slurp it in with this code:
319
320 my $catalog = XMLin('./catalog.xml');
321
322 Which would return a data structure like this:
323
324 $catalog = {
325 'part' => [
326 {
327 'partnum' => '1842334',
328 'desc' => 'High pressure flange',
329 'price' => '24.50'
330 },
331 {
332 'partnum' => '9344675',
333 'desc' => 'Threaded gasket',
334 'price' => '9.25'
335 },
336 {
337 'partnum' => '5634896',
338 'desc' => 'Low voltage washer',
339 'price' => '12.00'
340 }
341 ]
342 };
343
344 Then you could access the description of the first part in the catalog
345 with this code:
346
347 print $catalog->{part}->[0]->{desc}, "\n";
348
349 However, if you wanted to access the description of the part with the
350 part number of "9344675" then you'd have to code a loop like this:
351
352 foreach my $part (@{$catalog->{part}}) {
353 if($part->{partnum} eq '9344675') {
354 print $part->{desc}, "\n";
355 last;
356 }
357 }
358
359 The knowledge that each <part> element has a unique partnum attribute
360 allows you to eliminate this search. You can pass this knowledge on to
361 XML::Simple like this:
362
363 my $catalog = XMLin($xml, keyattr => ['partnum']);
364
365 Which will return a data structure like this:
366
367 $catalog = {
368 'part' => {
369 '5634896' => { 'desc' => 'Low voltage washer', 'price' => '12.00' },
370 '1842334' => { 'desc' => 'High pressure flange', 'price' => '24.50' },
371 '9344675' => { 'desc' => 'Threaded gasket', 'price' => '9.25' }
372 }
373 };
374
375 XML::Simple has been able to transform $catalog->{part} from an
376 arrayref to a hashref (keyed on partnum). This transformation is
377 called 'array folding'.
378
379 Through the use of array folding, you can now index directly to the
380 description of the part you want:
381
382 print $catalog->{part}->{9344675}->{desc}, "\n";
383
384 The 'keyattr' option also enables array folding when the unique key is
385 in a nested element rather than an attribute. eg:
386
387 <catalog>
388 <part>
389 <partnum>1842334</partnum>
390 <desc>High pressure flange</desc>
391 <price>24.50</price>
392 </part>
393 <part>
394 <partnum>9344675</partnum>
395 <desc>Threaded gasket</desc>
396 <price>9.25</price>
397 </part>
398 <part>
399 <partnum>5634896</partnum>
400 <desc>Low voltage washer</desc>
401 <price>12.00</price>
402 </part>
403 </catalog>
404
405 See the XML::Simple manual page for more information.
406
407 So what's the catch with 'keyattr'?
408 One thing to watch out for is that you might get array folding even if
409 you don't supply the keyattr option. The default value for this option
410 is:
411
412 [ 'name', 'key', 'id']
413
414 Which means if your XML elements have a 'name', 'key' or 'id' attribute
415 (or nested element) then they may get folded on those values. This
416 means that you can take advantage of array folding simply through
417 careful choice of attribute names. On the hand, if you really don't
418 want array folding at all, you'll need to set 'key attr to an empty
419 list:
420
421 my $ref = XMLin($xml, keyattr => []);
422
423 A second 'gotcha' is that array folding only works on arrays. That
424 might seem obvious, but if there's only one record in your XML and you
425 didn't set the 'forcearray' option then it won't be represented as an
426 array and consequently won't get folded into a hash. The moral is that
427 if you're using array folding, you should always turn on the forcearray
428 option.
429
430 You probably want to be as specific as you can be too. For instance,
431 the safest way to parse the <catalog> example above would be:
432
433 my $catalog = XMLin($xml, keyattr => { part => 'partnum'},
434 forcearray => ['part']);
435
436 By using the hashref for keyattr, you can specify that only <part>
437 elements should be folded on the 'partnum' attribute (and that the
438 <part> elements should not be folded on any other attribute).
439
440 By supplying a list of element names for forcearray, you're ensuring
441 that folding will work even if there's only one <part>. You're also
442 ensuring that if the 'partnum' unique key is supplied in a nested
443 element then that element won't get forced to an array too.
444
445 How do I know what my data structure should look like?
446 The rules are fairly straightforward:
447
448 · each element gets represented as a hash
449
450 · unless it contains only text, in which case it'll be a simple
451 scalar value
452
453 · or unless there's more than one element with the same name, in
454 which case they'll be represented as an array
455
456 · unless you've got array folding enabled, in which case they'll be
457 folded into a hash
458
459 · empty elements (no text contents and no attributes) will either be
460 represented as an empty hash, an empty string or undef - depending
461 on the value of the 'suppressempty' option.
462
463 If you're in any doubt, use Data::Dumper, eg:
464
465 use XML::Simple;
466 use Data::Dumper;
467
468 my $ref = XMLin($xml);
469
470 print Dumper($ref);
471
472 I'm getting 'Use of uninitialized value' warnings
473 You're probably trying to index into a non-existant hash key - try
474 Data::Dumper.
475
476 I'm getting a 'Not an ARRAY reference' error
477 Something that you expect to be an array is not. The two most likely
478 causes are that you forgot to use 'forcearray' or that the array got
479 folded into a hash - try Data::Dumper.
480
481 I'm getting a 'No such array field' error
482 Something that you expect to be a hash is actually an array. Perhaps
483 array folding failed because one element was missing the key attribute
484 - try Data::Dumper.
485
486 I'm getting an 'Out of memory' error
487 Something in the data structure is not as you expect and Perl may be
488 trying unsuccessfully to autovivify things - try Data::Dumper.
489
490 If you're already using Data::Dumper, try calling Dumper() immediately
491 after XMLin() - ie: before you attempt to access anything in the data
492 structure.
493
494 My element order is getting jumbled up
495 If you read an XML file with XMLin() and then write it back out with
496 XMLout(), the order of the elements will likely be different.
497 (However, if you read the file back in with XMLin() you'll get the same
498 Perl data structure).
499
500 The reordering happens because XML::Simple uses hashrefs to store your
501 data and Perl hashes do not really have any order.
502
503 It is possible that a future version of XML::Simple will use
504 Tie::IxHash to store the data in hashrefs which do retain the order.
505 However this will not fix all cases of element order being lost.
506
507 If your application really is sensitive to element order, don't use
508 XML::Simple (and don't put order-sensitive values in attributes).
509
510 XML::Simple turns nested elements into attributes
511 If you read an XML file with XMLin() and then write it back out with
512 XMLout(), some data which was originally stored in nested elements may
513 end up in attributes. (However, if you read the file back in with
514 XMLin() you'll get the same Perl data structure).
515
516 There are a number of ways you might handle this:
517
518 · use the 'forcearray' option with XMLin()
519
520 · use the 'noattr' option with XMLout()
521
522 · live with it
523
524 · don't use XML::Simple
525
526 Why does XMLout() insert <name> elements (or attributes)?
527 Try setting keyattr => [].
528
529 When you call XMLin() to read XML, the 'keyattr' option controls
530 whether arrays get 'folded' into hashes. Similarly, when you call
531 XMLout(), the 'keyattr' option controls whether hashes get 'unfolded'
532 into arrays. As described above, 'keyattr' is enabled by default.
533
534 Why are empty elements represented as empty hashes?
535 An element is always represented as a hash unless it contains only
536 text, in which case it is represented as a scalar string.
537
538 If you would prefer empty elements to be represented as empty strings
539 or the undefined value, set the 'suppressempty' option to '' or undef
540 respectively.
541
542 Why is ParserOpts deprecated?
543 The "ParserOpts" option is a remnant of the time when XML::Simple only
544 worked with the XML::Parser API. Its value is completely ignored if
545 you're using a SAX parser, so writing code which relied on it would bar
546 you from taking advantage of SAX.
547
548 Even if you are using XML::Parser, it is seldom necessary to pass
549 options to the parser object. A number of people have written to say
550 they use this option to set XML::Parser's "ProtocolEncoding" option.
551 Don't do that, it's wrong, Wrong, WRONG! Fix the XML document so that
552 it's well-formed and you won't have a problem.
553
554 Having said all of that, as long as XML::Simple continues to support
555 the XML::Parser API, this option will not be removed. There are
556 currently no plans to remove support for the XML::Parser API.
557
558
559
560perl v5.16.3 2012-06-20 XML::Simple::FAQ(3)