1XML::LibXML::Simple(3)User Contributed Perl DocumentationXML::LibXML::Simple(3)
2
3
4
6 XML::LibXML::Simple - XML::LibXML clone of XML::Simple::XMLin()
7
9 XML::LibXML::Simple
10 is a Exporter
11
13 my $xml = ...; # filename, fh, string, or XML::LibXML-node
14
15 Imperative:
16
17 use XML::LibXML::Simple qw(XMLin);
18 my $data = XMLin $xml, %options;
19
20 Or the Object Oriented way:
21
22 use XML::LibXML::Simple ();
23 my $xs = XML::LibXML::Simple->new(%options);
24 my $data = $xs->XMLin($xml, %options);
25
27 This module is a blunt rewrite of XML::Simple (by Grant McLean) to use
28 the XML::LibXML parser for XML structures, where the original uses
29 plain Perl or SAX parsers.
30
31 Be warned: this module thinks to be smart. You may very well shoot
32 yourself in the foot with this DWIMmery. Read the whole manual page at
33 least once before you start using it. If your XML is described in a
34 schema or WSDL, then use XML::Compile for maintainable code.
35
37 Constructors
38 XML::LibXML::Simple->new(%options)
39 Instantiate an object, which can be used to call XMLin() on. You
40 can provide %options to this constructor (to be reused for each
41 call to XMLin) and with each call of XMLin (to be used once)
42
43 For descriptions of the %options see the "DETAILS" section of this
44 manual page.
45
46 Translators
47 $obj->XMLin($xmldata, %options)
48 For $xmldata and descriptions of the %options see the "DETAILS"
49 section of this manual page.
50
52 The functions "XMLin" (exported implictly) and "xml_in" (exported on
53 request) simply call "<XML::LibXML::Simple-"new->XMLin() >> with the
54 provided parameters.
55
57 Parameter $xmldata
58 As first parameter to XMLin() must provide the XML message to be
59 translated into a Perl structure. Choose one of the following:
60
61 A filename
62 If the filename contains no directory components, "XMLin()" will
63 look for the file in each directory in the SearchPath (see OPTIONS
64 below) and in the current directory. eg:
65
66 $data = XMLin('/etc/params.xml', %options);
67
68 A dash (-)
69 Parse from STDIN.
70
71 $data = XMLin('-', %options);
72
73 undef
74 [deprecated] If there is no XML specifier, "XMLin()" will check the
75 script directory and each of the SearchPath directories for a file
76 with the same name as the script but with the extension '.xml'.
77 Note: if you wish to specify options, you must specify the value
78 'undef'. eg:
79
80 $data = XMLin(undef, ForceArray => 1);
81
82 This feature is available for backwards compatibility with
83 XML::Simple, but quite sensitive. You can easily hit the wrong xml
84 file as input. Please do not use it: always use an explicit
85 filename.
86
87 A string of XML
88 A string containing XML (recognised by the presence of '<' and '>'
89 characters) will be parsed directly. eg:
90
91 $data = XMLin('<opt username="bob" password="flurp" />', %options);
92
93 An IO::Handle object
94 In this case, XML::LibXML::Parser will read the XML data directly
95 from the provided file.
96
97 # $fh = IO::File->new('/etc/params.xml') or die;
98 open my $fh, '<:encoding(utf8)', '/etc/params.xml' or die;
99
100 $data = XMLin($fh, %options);
101
102 An XML::LibXML::Document or ::Element
103 [Not available in XML::Simple] When you have a pre-parsed
104 XML::LibXML node, you can pass that.
105
106 Parameter %options
107 XML::LibXML::Simple supports most options defined by XML::Simple, so
108 the interface is quite compatible. Minor changes apply. This
109 explanation is extracted from the XML::Simple manual-page.
110
111 · check out "ForceArray" because you'll almost certainly want to turn
112 it on
113
114 · make sure you know what the "KeyAttr" option does and what its
115 default value is because it may surprise you otherwise.
116
117 · Option names are case in-sensitive so you can use the mixed case
118 versions shown here; you can add underscores between the words (eg:
119 key_attr) if you like.
120
121 In alphabetic order:
122
123 ContentKey => 'keyname' # seldom used
124 When text content is parsed to a hash value, this option lets you
125 specify a name for the hash key to override the default 'content'.
126 So for example:
127
128 XMLin('<opt one="1">Two</opt>', ContentKey => 'text')
129
130 will parse to:
131
132 { one => 1, text => 'Two' }
133
134 instead of:
135
136 { one => 1, content => 'Two' }
137
138 You can also prefix your selected key name with a '-' character to
139 have "XMLin()" try a little harder to eliminate unnecessary
140 'content' keys after array folding. For example:
141
142 XMLin(
143 '<opt><item name="one">First</item><item name="two">Second</item></opt>',
144 KeyAttr => {item => 'name'},
145 ForceArray => [ 'item' ],
146 ContentKey => '-content'
147 )
148
149 will parse to:
150
151 {
152 item => {
153 one => 'First'
154 two => 'Second'
155 }
156 }
157
158 rather than this (without the '-'):
159
160 {
161 item => {
162 one => { content => 'First' }
163 two => { content => 'Second' }
164 }
165 }
166
167 ForceArray => 1 # important
168 This option should be set to '1' to force nested elements to be
169 represented as arrays even when there is only one. Eg, with
170 ForceArray enabled, this XML:
171
172 <opt>
173 <name>value</name>
174 </opt>
175
176 would parse to this:
177
178 { name => [ 'value' ] }
179
180 instead of this (the default):
181
182 { name => 'value' }
183
184 This option is especially useful if the data structure is likely to
185 be written back out as XML and the default behaviour of rolling
186 single nested elements up into attributes is not desirable.
187
188 If you are using the array folding feature, you should almost
189 certainly enable this option. If you do not, single nested
190 elements will not be parsed to arrays and therefore will not be
191 candidates for folding to a hash. (Given that the default value of
192 'KeyAttr' enables array folding, the default value of this option
193 should probably also have been enabled as well).
194
195 ForceArray => [ names ] # important
196 This alternative (and preferred) form of the 'ForceArray' option
197 allows you to specify a list of element names which should always
198 be forced into an array representation, rather than the 'all or
199 nothing' approach above.
200
201 It is also possible to include compiled regular expressions in the
202 list --any element names which match the pattern will be forced to
203 arrays. If the list contains only a single regex, then it is not
204 necessary to enclose it in an arrayref. Eg:
205
206 ForceArray => qr/_list$/
207
208 ForceContent => 1 # seldom used
209 When "XMLin()" parses elements which have text content as well as
210 attributes, the text content must be represented as a hash value
211 rather than a simple scalar. This option allows you to force text
212 content to always parse to a hash value even when there are no
213 attributes. So for example:
214
215 XMLin('<opt><x>text1</x><y a="2">text2</y></opt>', ForceContent => 1)
216
217 will parse to:
218
219 {
220 x => { content => 'text1' },
221 y => { a => 2, content => 'text2' }
222 }
223
224 instead of:
225
226 {
227 x => 'text1',
228 y => { 'a' => 2, 'content' => 'text2' }
229 }
230
231 GroupTags => { grouping tag => grouped tag } # handy
232 You can use this option to eliminate extra levels of indirection in
233 your Perl data structure. For example this XML:
234
235 <opt>
236 <searchpath>
237 <dir>/usr/bin</dir>
238 <dir>/usr/local/bin</dir>
239 <dir>/usr/X11/bin</dir>
240 </searchpath>
241 </opt>
242
243 Would normally be read into a structure like this:
244
245 {
246 searchpath => {
247 dir => [ '/usr/bin', '/usr/local/bin', '/usr/X11/bin' ]
248 }
249 }
250
251 But when read in with the appropriate value for 'GroupTags':
252
253 my $opt = XMLin($xml, GroupTags => { searchpath => 'dir' });
254
255 It will return this simpler structure:
256
257 {
258 searchpath => [ '/usr/bin', '/usr/local/bin', '/usr/X11/bin' ]
259 }
260
261 The grouping element ("<searchpath>" in the example) must not
262 contain any attributes or elements other than the grouped element.
263
264 You can specify multiple 'grouping element' to 'grouped element'
265 mappings in the same hashref. If this option is combined with
266 "KeyAttr", the array folding will occur first and then the grouped
267 element names will be eliminated.
268
269 HookNodes => CODE
270 Select document nodes to apply special tricks. Introduced in
271 [0.96], not available in XML::Simple.
272
273 When this option is provided, the CODE will be called once the XML
274 DOM tree is ready to get transformed into Perl. Your CODE should
275 return either "undef" (nothing to do) or a HASH which maps values
276 of unique_key (see XML::LibXML::Node method "unique_key" onto CODE
277 references to be called.
278
279 Once the translater from XML into Perl reaches a selected node, it
280 will call your routine specific for that node. That triggering
281 node found is the only parameter. When you return "undef", the
282 node will not be found in the final result. You may return any
283 data (even the node itself) which will be included in the final
284 result as is, under the name of the original node.
285
286 Example:
287
288 my $out = XMLin $file, HookNodes => \&protect_html;
289
290 sub protect_html($$)
291 { # $obj is the instantated XML::Compile::Simple object
292 # $xml is a XML::LibXML::Element to get transformed
293 my ($obj, $xml) = @_;
294
295 my %hooks; # collects the table of hooks
296
297 # do an xpath search for HTML
298 my $xpc = XML::LibXML::XPathContext->new($xml);
299 my @nodes = $xpc->findNodes(...); #XXX
300 @nodes or return undef;
301
302 my $as_text = sub { $_[0]->toString(0) }; # as text
303 # $as_node = sub { $_[0] }; # as node
304 # $skip = sub { undef }; # not at all
305
306 # the same behavior for all xpath nodes, in this example
307 $hook{$_->unique_key} = $as_text
308 for @nodes;
309
310 \%hook;
311 }
312
313 KeepRoot => 1 # handy
314 In its attempt to return a data structure free of superfluous
315 detail and unnecessary levels of indirection, "XMLin()" normally
316 discards the root element name. Setting the 'KeepRoot' option to
317 '1' will cause the root element name to be retained. So after
318 executing this code:
319
320 $config = XMLin('<config tempdir="/tmp" />', KeepRoot => 1)
321
322 You'll be able to reference the tempdir as
323 "$config->{config}->{tempdir}" instead of the default
324 "$config->{tempdir}".
325
326 KeyAttr => [ list ] # important
327 This option controls the 'array folding' feature which translates
328 nested elements from an array to a hash. It also controls the
329 'unfolding' of hashes to arrays.
330
331 For example, this XML:
332
333 <opt>
334 <user login="grep" fullname="Gary R Epstein" />
335 <user login="stty" fullname="Simon T Tyson" />
336 </opt>
337
338 would, by default, parse to this:
339
340 {
341 user => [
342 { login => 'grep',
343 fullname => 'Gary R Epstein'
344 },
345 { login => 'stty',
346 fullname => 'Simon T Tyson'
347 }
348 ]
349 }
350
351 If the option 'KeyAttr => "login"' were used to specify that the
352 'login' attribute is a key, the same XML would parse to:
353
354 {
355 user => {
356 stty => { fullname => 'Simon T Tyson' },
357 grep => { fullname => 'Gary R Epstein' }
358 }
359 }
360
361 The key attribute names should be supplied in an arrayref if there
362 is more than one. "XMLin()" will attempt to match attribute names
363 in the order supplied.
364
365 Note 1: The default value for 'KeyAttr' is "['name', 'key', 'id']".
366 If you do not want folding on input or unfolding on output you must
367 setting this option to an empty list to disable the feature.
368
369 Note 2: If you wish to use this option, you should also enable the
370 "ForceArray" option. Without 'ForceArray', a single nested element
371 will be rolled up into a scalar rather than an array and therefore
372 will not be folded (since only arrays get folded).
373
374 KeyAttr => { list } # important
375 This alternative (and preferred) method of specifying the key
376 attributes allows more fine grained control over which elements are
377 folded and on which attributes. For example the option 'KeyAttr =>
378 { package => 'id' } will cause any package elements to be folded on
379 the 'id' attribute. No other elements which have an 'id' attribute
380 will be folded at all.
381
382 Two further variations are made possible by prefixing a '+' or a
383 '-' character to the attribute name:
384
385 The option 'KeyAttr => { user => "+login" }' will cause this XML:
386
387 <opt>
388 <user login="grep" fullname="Gary R Epstein" />
389 <user login="stty" fullname="Simon T Tyson" />
390 </opt>
391
392 to parse to this data structure:
393
394 {
395 user => {
396 stty => {
397 fullname => 'Simon T Tyson',
398 login => 'stty'
399 },
400 grep => {
401 fullname => 'Gary R Epstein',
402 login => 'grep'
403 }
404 }
405 }
406
407 The '+' indicates that the value of the key attribute should be
408 copied rather than moved to the folded hash key.
409
410 A '-' prefix would produce this result:
411
412 {
413 user => {
414 stty => {
415 fullname => 'Simon T Tyson',
416 -login => 'stty'
417 },
418 grep => {
419 fullname => 'Gary R Epstein',
420 -login => 'grep'
421 }
422 }
423 }
424
425 NoAttr => 1 # handy
426 When used with "XMLin()", any attributes in the XML will be
427 ignored.
428
429 NormaliseSpace => 0 | 1 | 2 # handy
430 This option controls how whitespace in text content is handled.
431 Recognised values for the option are:
432
433 "0" (default) whitespace is passed through unaltered (except of
434 course for the normalisation of whitespace in attribute values
435 which is mandated by the XML recommendation)
436
437 "1" whitespace is normalised in any value used as a hash key
438 (normalising means removing leading and trailing whitespace and
439 collapsing sequences of whitespace characters to a single
440 space)
441
442 "2" whitespace is normalised in all text content
443
444 Note: you can spell this option with a 'z' if that is more natural
445 for you.
446
447 Parser => OBJECT
448 You may pass your own XML::LibXML object, in stead of having one
449 created for you. This is useful when you need specific
450 configuration on that object (See XML::LibXML::Parser) or have
451 implemented your own extension to that object.
452
453 The internally created parser object is configured in safe mode.
454 Read the XML::LibXML::Parser manual about security issues with
455 certain parameter settings. The default is unsafe!
456
457 ParserOpts => HASH|ARRAY
458 Pass parameters to the creation of a new internal parser object.
459 You can overrule the options which will create a safe parser. It
460 may be more readible to use the "Parser" parameter.
461
462 SearchPath => [ list ] # handy
463 If you pass "XMLin()" a filename, but the filename include no
464 directory component, you can use this option to specify which
465 directories should be searched to locate the file. You might use
466 this option to search first in the user's home directory, then in a
467 global directory such as /etc.
468
469 If a filename is provided to "XMLin()" but SearchPath is not
470 defined, the file is assumed to be in the current directory.
471
472 If the first parameter to "XMLin()" is undefined, the default
473 SearchPath will contain only the directory in which the script
474 itself is located. Otherwise the default SearchPath will be empty.
475
476 SuppressEmpty => 1 | '' | undef
477 [0.99] What to do with empty elements (no attributes and no
478 content). The default behaviour is to represent them as empty
479 hashes. Setting this option to a true value (eg: 1) will cause
480 empty elements to be skipped altogether. Setting the option to
481 'undef' or the empty string will cause empty elements to be
482 represented as the undefined value or the empty string
483 respectively.
484
485 ValueAttr => [ names ] # handy
486 Use this option to deal elements which always have a single
487 attribute and no content. Eg:
488
489 <opt>
490 <colour value="red" />
491 <size value="XXL" />
492 </opt>
493
494 Setting "ValueAttr => [ 'value' ]" will cause the above XML to
495 parse to:
496
497 {
498 colour => 'red',
499 size => 'XXL'
500 }
501
502 instead of this (the default):
503
504 {
505 colour => { value => 'red' },
506 size => { value => 'XXL' }
507 }
508
509 NsExpand => 0 advised
510 When name-spaces are used, the default behavior is to include the
511 prefix in the key name. However, this is very dangerous: the
512 prefixes can be changed without a change of the XML message
513 meaning. Therefore, you can better use this "NsExpand" option.
514 The downside, however, is that the labels get very long.
515
516 Without this option:
517
518 <record xmlns:x="http://xyz">
519 <x:field1>42</x:field1>
520 </record>
521 <record xmlns:y="http://xyz">
522 <y:field1>42</y:field1>
523 </record>
524
525 translates into
526
527 { 'x:field1' => 42 }
528 { 'y:field1' => 42 }
529
530 but both source component have exactly the same meaning. When
531 "NsExpand" is used, the result is:
532
533 { '{http://xyz}field1' => 42 }
534 { '{http://xyz}field1' => 42 }
535
536 Of course, addressing these fields is more work. It is advised to
537 implement it like this:
538
539 my $ns = 'http://xyz';
540 $data->{"{$ns}field1"};
541
542 NsStrip => 0 sloppy coding
543 [not available in XML::Simple] Namespaces are really important to
544 avoid name collissions, but they are a bit of a hassle. To do it
545 correctly, use option "NsExpand". To do it sloppy, use "NsStrip".
546 With this option set, the above example will return
547
548 { field1 => 42 }
549 { field1 => 42 }
550
552 When "XMLin()" reads the following very simple piece of XML:
553
554 <opt username="testuser" password="frodo"></opt>
555
556 it returns the following data structure:
557
558 {
559 username => 'testuser',
560 password => 'frodo'
561 }
562
563 The identical result could have been produced with this alternative
564 XML:
565
566 <opt username="testuser" password="frodo" />
567
568 Or this (although see 'ForceArray' option for variations):
569
570 <opt>
571 <username>testuser</username>
572 <password>frodo</password>
573 </opt>
574
575 Repeated nested elements are represented as anonymous arrays:
576
577 <opt>
578 <person firstname="Joe" lastname="Smith">
579 <email>joe@smith.com</email>
580 <email>jsmith@yahoo.com</email>
581 </person>
582 <person firstname="Bob" lastname="Smith">
583 <email>bob@smith.com</email>
584 </person>
585 </opt>
586
587 {
588 person => [
589 { email => [ 'joe@smith.com', 'jsmith@yahoo.com' ],
590 firstname => 'Joe',
591 lastname => 'Smith'
592 },
593 { email => 'bob@smith.com',
594 firstname => 'Bob',
595 lastname => 'Smith'
596 }
597 ]
598 }
599
600 Nested elements with a recognised key attribute are transformed
601 (folded) from an array into a hash keyed on the value of that attribute
602 (see the "KeyAttr" option):
603
604 <opt>
605 <person key="jsmith" firstname="Joe" lastname="Smith" />
606 <person key="tsmith" firstname="Tom" lastname="Smith" />
607 <person key="jbloggs" firstname="Joe" lastname="Bloggs" />
608 </opt>
609
610 {
611 person => {
612 jbloggs => {
613 firstname => 'Joe',
614 lastname => 'Bloggs'
615 },
616 tsmith => {
617 firstname => 'Tom',
618 lastname => 'Smith'
619 },
620 jsmith => {
621 firstname => 'Joe',
622 lastname => 'Smith'
623 }
624 }
625 }
626
627 The <anon> tag can be used to form anonymous arrays:
628
629 <opt>
630 <head><anon>Col 1</anon><anon>Col 2</anon><anon>Col 3</anon></head>
631 <data><anon>R1C1</anon><anon>R1C2</anon><anon>R1C3</anon></data>
632 <data><anon>R2C1</anon><anon>R2C2</anon><anon>R2C3</anon></data>
633 <data><anon>R3C1</anon><anon>R3C2</anon><anon>R3C3</anon></data>
634 </opt>
635
636 {
637 head => [ [ 'Col 1', 'Col 2', 'Col 3' ] ],
638 data => [ [ 'R1C1', 'R1C2', 'R1C3' ],
639 [ 'R2C1', 'R2C2', 'R2C3' ],
640 [ 'R3C1', 'R3C2', 'R3C3' ]
641 ]
642 }
643
644 Anonymous arrays can be nested to arbirtrary levels and as a special
645 case, if the surrounding tags for an XML document contain only an
646 anonymous array the arrayref will be returned directly rather than the
647 usual hashref:
648
649 <opt>
650 <anon><anon>Col 1</anon><anon>Col 2</anon></anon>
651 <anon><anon>R1C1</anon><anon>R1C2</anon></anon>
652 <anon><anon>R2C1</anon><anon>R2C2</anon></anon>
653 </opt>
654
655 [
656 [ 'Col 1', 'Col 2' ],
657 [ 'R1C1', 'R1C2' ],
658 [ 'R2C1', 'R2C2' ]
659 ]
660
661 Elements which only contain text content will simply be represented as
662 a scalar. Where an element has both attributes and text content, the
663 element will be represented as a hashref with the text content in the
664 'content' key (see the "ContentKey" option):
665
666 <opt>
667 <one>first</one>
668 <two attr="value">second</two>
669 </opt>
670
671 {
672 one => 'first',
673 two => { attr => 'value', content => 'second' }
674 }
675
676 Mixed content (elements which contain both text content and nested
677 elements) will be not be represented in a useful way - element order
678 and significant whitespace will be lost. If you need to work with
679 mixed content, then XML::Simple is not the right tool for your job -
680 check out the next section.
681
682 Differences to XML::Simple
683 In general, the output and the options are equivalent, although this
684 module has some differences with XML::Simple to be aware of.
685
686 only XMLin() is supported
687 If you want to write XML then use a schema (for instance with
688 XML::Compile). Do not attempt to create XML by hand! If you still
689 think you need it, then have a look at XMLout() as implemented by
690 XML::Simple or any of a zillion template systems.
691
692 no "variables" option
693 IMO, you should use a templating system if you want variables
694 filled-in in the input: it is not a task for this module.
695
696 ForceArray options
697 There are a few small differences in the result of the "forcearray"
698 option, because XML::Simple seems to behave inconsequently.
699
700 hooks
701 XML::Simple does not support hooks.
702
704 XML::Compile for processing XML when a schema is available. When you
705 have a schema, the data and structure of your message get validated.
706
707 XML::Simple, the original implementation which interface is followed as
708 closely as possible.
709
711 The interface design and large parts of the documentation were taken
712 from the XML::Simple module, written by Grant McLean <grantm@cpan.org>
713
714 Copyrights of the perl code and the related documentation by 2008-2017
715 by [Mark Overmeer]. For other contributors see ChangeLog.
716
717 This program is free software; you can redistribute it and/or modify it
718 under the same terms as Perl itself. See http://dev.perl.org/licenses/
719
720
721
722perl v5.30.0 2019-07-26 XML::LibXML::Simple(3)