1XML::LibXML::Simple(3)User Contributed Perl DocumentationXML::LibXML::Simple(3)
2
3
4
6 XML::LibXML::Simple - XML::LibXML clone of XML::Simple::XMLin()
7
9 XML::LibXML::Simple
10 is a Exporter
11
13 my $xml = ...; # filename, fh, string, or XML::LibXML-node
14
15 Imperative:
16
17 use XML::LibXML::Simple qw(XMLin);
18 my $data = XMLin $xml, %options;
19
20 Or the Object Oriented way:
21
22 use XML::LibXML::Simple ();
23 my $xs = XML::LibXML::Simple->new(%options);
24 my $data = $xs->XMLin($xml, %options);
25
27 This module is a blunt rewrite of XML::Simple (by Grant McLean) to use
28 the XML::LibXML parser for XML structures, where the original uses
29 plain Perl or SAX parsers.
30
31 Be warned: this module thinks to be smart. You may very well shoot
32 yourself in the foot with this DWIMmery. Read the whole manual page at
33 least once before you start using it. If your XML is described in a
34 schema or WSDL, then use XML::Compile for maintainable code.
35
37 Constructors
38 XML::LibXML::Simple->new(%options)
39 Instantiate an object, which can be used to call XMLin() on. You
40 can provide %options to this constructor (to be reused for each
41 call to XMLin) and with each call of XMLin (to be used once)
42
43 For descriptions of the %options see the "DETAILS" section of this
44 manual page.
45
46 Translators
47 $obj->XMLin($xmldata, %options)
48 For $xmldata and descriptions of the %options see the "DETAILS"
49 section of this manual page.
50
52 The functions "XMLin" (exported implictly) and "xml_in" (exported on
53 request) simply call "<XML::LibXML::Simple-"new->XMLin() >> with the
54 provided parameters.
55
57 Parameter $xmldata
58 As first parameter to XMLin() must provide the XML message to be
59 translated into a Perl structure. Choose one of the following:
60
61 A filename
62 If the filename contains no directory components, XMLin() will look
63 for the file in each directory in the SearchPath (see OPTIONS
64 below) and in the current directory. eg:
65
66 $data = XMLin('/etc/params.xml', %options);
67
68 A dash (-)
69 Parse from STDIN.
70
71 $data = XMLin('-', %options);
72
73 undef
74 [deprecated] If there is no XML specifier, XMLin() will check the
75 script directory and each of the SearchPath directories for a file
76 with the same name as the script but with the extension '.xml'.
77 Note: if you wish to specify options, you must specify the value
78 'undef'. eg:
79
80 $data = XMLin(undef, ForceArray => 1);
81
82 This feature is available for backwards compatibility with
83 XML::Simple, but quite sensitive. You can easily hit the wrong xml
84 file as input. Please do not use it: always use an explicit
85 filename.
86
87 A string of XML
88 A string containing XML (recognised by the presence of '<' and '>'
89 characters) will be parsed directly. eg:
90
91 $data = XMLin('<opt username="bob" password="flurp" />', %options);
92
93 An IO::Handle object
94 In this case, XML::LibXML::Parser will read the XML data directly
95 from the provided file.
96
97 # $fh = IO::File->new('/etc/params.xml') or die;
98 open my $fh, '<:encoding(utf8)', '/etc/params.xml' or die;
99
100 $data = XMLin($fh, %options);
101
102 An XML::LibXML::Document or ::Element
103 [Not available in XML::Simple] When you have a pre-parsed
104 XML::LibXML node, you can pass that.
105
106 Parameter %options
107 XML::LibXML::Simple supports most options defined by XML::Simple, so
108 the interface is quite compatible. Minor changes apply. This
109 explanation is extracted from the XML::Simple manual-page.
110
111 • check out "ForceArray" because you'll almost certainly want to turn
112 it on
113
114 • make sure you know what the "KeyAttr" option does and what its
115 default value is because it may surprise you otherwise.
116
117 • Option names are case in-sensitive so you can use the mixed case
118 versions shown here; you can add underscores between the words (eg:
119 key_attr) if you like.
120
121 In alphabetic order:
122
123 ContentKey => 'keyname' # seldom used
124 When text content is parsed to a hash value, this option lets you
125 specify a name for the hash key to override the default 'content'.
126 So for example:
127
128 XMLin('<opt one="1">Two</opt>', ContentKey => 'text')
129
130 will parse to:
131
132 { one => 1, text => 'Two' }
133
134 instead of:
135
136 { one => 1, content => 'Two' }
137
138 You can also prefix your selected key name with a '-' character to
139 have XMLin() try a little harder to eliminate unnecessary 'content'
140 keys after array folding. For example:
141
142 XMLin(
143 '<opt><item name="one">First</item><item name="two">Second</item></opt>',
144 KeyAttr => {item => 'name'},
145 ForceArray => [ 'item' ],
146 ContentKey => '-content'
147 )
148
149 will parse to:
150
151 {
152 item => {
153 one => 'First'
154 two => 'Second'
155 }
156 }
157
158 rather than this (without the '-'):
159
160 {
161 item => {
162 one => { content => 'First' }
163 two => { content => 'Second' }
164 }
165 }
166
167 ForceArray => 1 # important
168 This option should be set to '1' to force nested elements to be
169 represented as arrays even when there is only one. Eg, with
170 ForceArray enabled, this XML:
171
172 <opt>
173 <name>value</name>
174 </opt>
175
176 would parse to this:
177
178 { name => [ 'value' ] }
179
180 instead of this (the default):
181
182 { name => 'value' }
183
184 This option is especially useful if the data structure is likely to
185 be written back out as XML and the default behaviour of rolling
186 single nested elements up into attributes is not desirable.
187
188 If you are using the array folding feature, you should almost
189 certainly enable this option. If you do not, single nested
190 elements will not be parsed to arrays and therefore will not be
191 candidates for folding to a hash. (Given that the default value of
192 'KeyAttr' enables array folding, the default value of this option
193 should probably also have been enabled as well).
194
195 ForceArray => [ names ] # important
196 This alternative (and preferred) form of the 'ForceArray' option
197 allows you to specify a list of element names which should always
198 be forced into an array representation, rather than the 'all or
199 nothing' approach above.
200
201 It is also possible to include compiled regular expressions in the
202 list --any element names which match the pattern will be forced to
203 arrays. If the list contains only a single regex, then it is not
204 necessary to enclose it in an arrayref. Eg:
205
206 ForceArray => qr/_list$/
207
208 ForceContent => 1 # seldom used
209 When XMLin() parses elements which have text content as well as
210 attributes, the text content must be represented as a hash value
211 rather than a simple scalar. This option allows you to force text
212 content to always parse to a hash value even when there are no
213 attributes. So for example:
214
215 XMLin('<opt><x>text1</x><y a="2">text2</y></opt>', ForceContent => 1)
216
217 will parse to:
218
219 {
220 x => { content => 'text1' },
221 y => { a => 2, content => 'text2' }
222 }
223
224 instead of:
225
226 {
227 x => 'text1',
228 y => { 'a' => 2, 'content' => 'text2' }
229 }
230
231 GroupTags => { grouping tag => grouped tag } # handy
232 You can use this option to eliminate extra levels of indirection in
233 your Perl data structure. For example this XML:
234
235 <opt>
236 <searchpath>
237 <dir>/usr/bin</dir>
238 <dir>/usr/local/bin</dir>
239 <dir>/usr/X11/bin</dir>
240 </searchpath>
241 </opt>
242
243 Would normally be read into a structure like this:
244
245 {
246 searchpath => {
247 dir => [ '/usr/bin', '/usr/local/bin', '/usr/X11/bin' ]
248 }
249 }
250
251 But when read in with the appropriate value for 'GroupTags':
252
253 my $opt = XMLin($xml, GroupTags => { searchpath => 'dir' });
254
255 It will return this simpler structure:
256
257 {
258 searchpath => [ '/usr/bin', '/usr/local/bin', '/usr/X11/bin' ]
259 }
260
261 The grouping element ("<searchpath>" in the example) must not
262 contain any attributes or elements other than the grouped element.
263
264 You can specify multiple 'grouping element' to 'grouped element'
265 mappings in the same hashref. If this option is combined with
266 "KeyAttr", the array folding will occur first and then the grouped
267 element names will be eliminated.
268
269 HookNodes => CODE
270 Select document nodes to apply special tricks. Introduced in
271 [0.96], not available in XML::Simple.
272
273 When this option is provided, the CODE will be called once the XML
274 DOM tree is ready to get transformed into Perl. Your CODE should
275 return either "undef" (nothing to do) or a HASH which maps values
276 of unique_key (see XML::LibXML::Node method "unique_key" onto CODE
277 references to be called.
278
279 Once the translater from XML into Perl reaches a selected node, it
280 will call your routine specific for that node. That triggering
281 node found is the only parameter. When you return "undef", the
282 node will not be found in the final result. You may return any
283 data (even the node itself) which will be included in the final
284 result as is, under the name of the original node.
285
286 Example:
287
288 my $out = XMLin $file, HookNodes => \&protect_html;
289
290 sub protect_html($$)
291 { # $obj is the instantated XML::Compile::Simple object
292 # $xml is a XML::LibXML::Element to get transformed
293 my ($obj, $xml) = @_;
294
295 my %hooks; # collects the table of hooks
296
297 # do an xpath search for HTML
298 my $xpc = XML::LibXML::XPathContext->new($xml);
299 my @nodes = $xpc->findNodes(...); #XXX
300 @nodes or return undef;
301
302 my $as_text = sub { $_[0]->toString(0) }; # as text
303 # $as_node = sub { $_[0] }; # as node
304 # $skip = sub { undef }; # not at all
305
306 # the same behavior for all xpath nodes, in this example
307 $hook{$_->unique_key} = $as_text
308 for @nodes;
309
310 \%hook;
311 }
312
313 KeepRoot => 1 # handy
314 In its attempt to return a data structure free of superfluous
315 detail and unnecessary levels of indirection, XMLin() normally
316 discards the root element name. Setting the 'KeepRoot' option to
317 '1' will cause the root element name to be retained. So after
318 executing this code:
319
320 $config = XMLin('<config tempdir="/tmp" />', KeepRoot => 1)
321
322 You'll be able to reference the tempdir as
323 "$config->{config}->{tempdir}" instead of the default
324 "$config->{tempdir}".
325
326 KeyAttr => [ list ] # important
327 This option controls the 'array folding' feature which translates
328 nested elements from an array to a hash. It also controls the
329 'unfolding' of hashes to arrays.
330
331 For example, this XML:
332
333 <opt>
334 <user login="grep" fullname="Gary R Epstein" />
335 <user login="stty" fullname="Simon T Tyson" />
336 </opt>
337
338 would, by default, parse to this:
339
340 {
341 user => [
342 { login => 'grep',
343 fullname => 'Gary R Epstein'
344 },
345 { login => 'stty',
346 fullname => 'Simon T Tyson'
347 }
348 ]
349 }
350
351 If the option 'KeyAttr => "login"' were used to specify that the
352 'login' attribute is a key, the same XML would parse to:
353
354 {
355 user => {
356 stty => { fullname => 'Simon T Tyson' },
357 grep => { fullname => 'Gary R Epstein' }
358 }
359 }
360
361 The key attribute names should be supplied in an arrayref if there
362 is more than one. XMLin() will attempt to match attribute names in
363 the order supplied.
364
365 Note 1: The default value for 'KeyAttr' is "['name', 'key', 'id']".
366 If you do not want folding on input or unfolding on output you must
367 setting this option to an empty list to disable the feature.
368
369 Note 2: If you wish to use this option, you should also enable the
370 "ForceArray" option. Without 'ForceArray', a single nested element
371 will be rolled up into a scalar rather than an array and therefore
372 will not be folded (since only arrays get folded).
373
374 KeyAttr => { list } # important
375 This alternative (and preferred) method of specifying the key
376 attributes allows more fine grained control over which elements are
377 folded and on which attributes. For example the option 'KeyAttr =>
378 { package => 'id' } will cause any package elements to be folded on
379 the 'id' attribute. No other elements which have an 'id' attribute
380 will be folded at all.
381
382 Two further variations are made possible by prefixing a '+' or a
383 '-' character to the attribute name:
384
385 The option 'KeyAttr => { user => "+login" }' will cause this XML:
386
387 <opt>
388 <user login="grep" fullname="Gary R Epstein" />
389 <user login="stty" fullname="Simon T Tyson" />
390 </opt>
391
392 to parse to this data structure:
393
394 {
395 user => {
396 stty => {
397 fullname => 'Simon T Tyson',
398 login => 'stty'
399 },
400 grep => {
401 fullname => 'Gary R Epstein',
402 login => 'grep'
403 }
404 }
405 }
406
407 The '+' indicates that the value of the key attribute should be
408 copied rather than moved to the folded hash key.
409
410 A '-' prefix would produce this result:
411
412 {
413 user => {
414 stty => {
415 fullname => 'Simon T Tyson',
416 -login => 'stty'
417 },
418 grep => {
419 fullname => 'Gary R Epstein',
420 -login => 'grep'
421 }
422 }
423 }
424
425 NoAttr => 1 # handy
426 When used with XMLin(), any attributes in the XML will be ignored.
427
428 NormaliseSpace => 0 | 1 | 2 # handy
429 This option controls how whitespace in text content is handled.
430 Recognised values for the option are:
431
432 "0" (default) whitespace is passed through unaltered (except of
433 course for the normalisation of whitespace in attribute values
434 which is mandated by the XML recommendation)
435
436 "1" whitespace is normalised in any value used as a hash key
437 (normalising means removing leading and trailing whitespace and
438 collapsing sequences of whitespace characters to a single
439 space)
440
441 "2" whitespace is normalised in all text content
442
443 Note: you can spell this option with a 'z' if that is more natural
444 for you.
445
446 Parser => OBJECT
447 You may pass your own XML::LibXML object, in stead of having one
448 created for you. This is useful when you need specific
449 configuration on that object (See XML::LibXML::Parser) or have
450 implemented your own extension to that object.
451
452 The internally created parser object is configured in safe mode.
453 Read the XML::LibXML::Parser manual about security issues with
454 certain parameter settings. The default is unsafe!
455
456 ParserOpts => HASH|ARRAY
457 Pass parameters to the creation of a new internal parser object.
458 You can overrule the options which will create a safe parser. It
459 may be more readible to use the "Parser" parameter.
460
461 SearchPath => [ list ] # handy
462 If you pass XMLin() a filename, but the filename include no
463 directory component, you can use this option to specify which
464 directories should be searched to locate the file. You might use
465 this option to search first in the user's home directory, then in a
466 global directory such as /etc.
467
468 If a filename is provided to XMLin() but SearchPath is not defined,
469 the file is assumed to be in the current directory.
470
471 If the first parameter to XMLin() is undefined, the default
472 SearchPath will contain only the directory in which the script
473 itself is located. Otherwise the default SearchPath will be empty.
474
475 SuppressEmpty => 1 | '' | undef
476 [0.99] What to do with empty elements (no attributes and no
477 content). The default behaviour is to represent them as empty
478 hashes. Setting this option to a true value (eg: 1) will cause
479 empty elements to be skipped altogether. Setting the option to
480 'undef' or the empty string will cause empty elements to be
481 represented as the undefined value or the empty string
482 respectively.
483
484 ValueAttr => [ names ] # handy
485 Use this option to deal elements which always have a single
486 attribute and no content. Eg:
487
488 <opt>
489 <colour value="red" />
490 <size value="XXL" />
491 </opt>
492
493 Setting "ValueAttr => [ 'value' ]" will cause the above XML to
494 parse to:
495
496 {
497 colour => 'red',
498 size => 'XXL'
499 }
500
501 instead of this (the default):
502
503 {
504 colour => { value => 'red' },
505 size => { value => 'XXL' }
506 }
507
508 NsExpand => 0 advised
509 When name-spaces are used, the default behavior is to include the
510 prefix in the key name. However, this is very dangerous: the
511 prefixes can be changed without a change of the XML message
512 meaning. Therefore, you can better use this "NsExpand" option.
513 The downside, however, is that the labels get very long.
514
515 Without this option:
516
517 <record xmlns:x="http://xyz">
518 <x:field1>42</x:field1>
519 </record>
520 <record xmlns:y="http://xyz">
521 <y:field1>42</y:field1>
522 </record>
523
524 translates into
525
526 { 'x:field1' => 42 }
527 { 'y:field1' => 42 }
528
529 but both source component have exactly the same meaning. When
530 "NsExpand" is used, the result is:
531
532 { '{http://xyz}field1' => 42 }
533 { '{http://xyz}field1' => 42 }
534
535 Of course, addressing these fields is more work. It is advised to
536 implement it like this:
537
538 my $ns = 'http://xyz';
539 $data->{"{$ns}field1"};
540
541 NsStrip => 0 sloppy coding
542 [not available in XML::Simple] Namespaces are really important to
543 avoid name collissions, but they are a bit of a hassle. To do it
544 correctly, use option "NsExpand". To do it sloppy, use "NsStrip".
545 With this option set, the above example will return
546
547 { field1 => 42 }
548 { field1 => 42 }
549
551 When XMLin() reads the following very simple piece of XML:
552
553 <opt username="testuser" password="frodo"></opt>
554
555 it returns the following data structure:
556
557 {
558 username => 'testuser',
559 password => 'frodo'
560 }
561
562 The identical result could have been produced with this alternative
563 XML:
564
565 <opt username="testuser" password="frodo" />
566
567 Or this (although see 'ForceArray' option for variations):
568
569 <opt>
570 <username>testuser</username>
571 <password>frodo</password>
572 </opt>
573
574 Repeated nested elements are represented as anonymous arrays:
575
576 <opt>
577 <person firstname="Joe" lastname="Smith">
578 <email>joe@smith.com</email>
579 <email>jsmith@yahoo.com</email>
580 </person>
581 <person firstname="Bob" lastname="Smith">
582 <email>bob@smith.com</email>
583 </person>
584 </opt>
585
586 {
587 person => [
588 { email => [ 'joe@smith.com', 'jsmith@yahoo.com' ],
589 firstname => 'Joe',
590 lastname => 'Smith'
591 },
592 { email => 'bob@smith.com',
593 firstname => 'Bob',
594 lastname => 'Smith'
595 }
596 ]
597 }
598
599 Nested elements with a recognised key attribute are transformed
600 (folded) from an array into a hash keyed on the value of that attribute
601 (see the "KeyAttr" option):
602
603 <opt>
604 <person key="jsmith" firstname="Joe" lastname="Smith" />
605 <person key="tsmith" firstname="Tom" lastname="Smith" />
606 <person key="jbloggs" firstname="Joe" lastname="Bloggs" />
607 </opt>
608
609 {
610 person => {
611 jbloggs => {
612 firstname => 'Joe',
613 lastname => 'Bloggs'
614 },
615 tsmith => {
616 firstname => 'Tom',
617 lastname => 'Smith'
618 },
619 jsmith => {
620 firstname => 'Joe',
621 lastname => 'Smith'
622 }
623 }
624 }
625
626 The <anon> tag can be used to form anonymous arrays:
627
628 <opt>
629 <head><anon>Col 1</anon><anon>Col 2</anon><anon>Col 3</anon></head>
630 <data><anon>R1C1</anon><anon>R1C2</anon><anon>R1C3</anon></data>
631 <data><anon>R2C1</anon><anon>R2C2</anon><anon>R2C3</anon></data>
632 <data><anon>R3C1</anon><anon>R3C2</anon><anon>R3C3</anon></data>
633 </opt>
634
635 {
636 head => [ [ 'Col 1', 'Col 2', 'Col 3' ] ],
637 data => [ [ 'R1C1', 'R1C2', 'R1C3' ],
638 [ 'R2C1', 'R2C2', 'R2C3' ],
639 [ 'R3C1', 'R3C2', 'R3C3' ]
640 ]
641 }
642
643 Anonymous arrays can be nested to arbirtrary levels and as a special
644 case, if the surrounding tags for an XML document contain only an
645 anonymous array the arrayref will be returned directly rather than the
646 usual hashref:
647
648 <opt>
649 <anon><anon>Col 1</anon><anon>Col 2</anon></anon>
650 <anon><anon>R1C1</anon><anon>R1C2</anon></anon>
651 <anon><anon>R2C1</anon><anon>R2C2</anon></anon>
652 </opt>
653
654 [
655 [ 'Col 1', 'Col 2' ],
656 [ 'R1C1', 'R1C2' ],
657 [ 'R2C1', 'R2C2' ]
658 ]
659
660 Elements which only contain text content will simply be represented as
661 a scalar. Where an element has both attributes and text content, the
662 element will be represented as a hashref with the text content in the
663 'content' key (see the "ContentKey" option):
664
665 <opt>
666 <one>first</one>
667 <two attr="value">second</two>
668 </opt>
669
670 {
671 one => 'first',
672 two => { attr => 'value', content => 'second' }
673 }
674
675 Mixed content (elements which contain both text content and nested
676 elements) will be not be represented in a useful way - element order
677 and significant whitespace will be lost. If you need to work with
678 mixed content, then XML::Simple is not the right tool for your job -
679 check out the next section.
680
681 Differences to XML::Simple
682 In general, the output and the options are equivalent, although this
683 module has some differences with XML::Simple to be aware of.
684
685 only XMLin() is supported
686 If you want to write XML then use a schema (for instance with
687 XML::Compile). Do not attempt to create XML by hand! If you still
688 think you need it, then have a look at XMLout() as implemented by
689 XML::Simple or any of a zillion template systems.
690
691 no "variables" option
692 IMO, you should use a templating system if you want variables
693 filled-in in the input: it is not a task for this module.
694
695 ForceArray options
696 There are a few small differences in the result of the "forcearray"
697 option, because XML::Simple seems to behave inconsequently.
698
699 hooks
700 XML::Simple does not support hooks.
701
703 XML::Compile for processing XML when a schema is available. When you
704 have a schema, the data and structure of your message get validated.
705
706 XML::Simple, the original implementation which interface is followed as
707 closely as possible.
708
710 The interface design and large parts of the documentation were taken
711 from the XML::Simple module, written by Grant McLean <grantm@cpan.org>
712
713 Copyrights of the perl code and the related documentation by 2008-2020
714 by [Mark Overmeer <markov@cpan.org>]. For other contributors see
715 ChangeLog.
716
717 This program is free software; you can redistribute it and/or modify it
718 under the same terms as Perl itself. See http://dev.perl.org/licenses/
719
720
721
722perl v5.36.0 2023-01-20 XML::LibXML::Simple(3)