1XML::LibXML::Simple(3)User Contributed Perl DocumentationXML::LibXML::Simple(3)
2
3
4

NAME

6       XML::LibXML::Simple - XML::LibXML clone of XML::Simple::XMLin()
7

INHERITANCE

9        XML::LibXML::Simple
10          is a Exporter
11

SYNOPSIS

13         my $xml  = ...;  # filename, fh, string, or XML::LibXML-node
14
15       Imperative:
16
17         use XML::LibXML::Simple   qw(XMLin);
18         my $data = XMLin $xml, %options;
19
20       Or the Object Oriented way:
21
22         use XML::LibXML::Simple   ();
23         my $xs   = XML::LibXML::Simple->new(%options);
24         my $data = $xs->XMLin($xml, %options);
25

DESCRIPTION

27       This module is a blunt rewrite of XML::Simple (by Grant McLean) to use
28       the XML::LibXML parser for XML structures, where the original uses
29       plain Perl or SAX parsers.
30
31       Be warned: this module thinks to be smart.  You may very well shoot
32       yourself in the foot with this DWIMmery.  Read the whole manual page at
33       least once before you start using it.  If your XML is described in a
34       schema or WSDL, then use XML::Compile for maintainable code.
35

METHODS

37   Constructors
38       XML::LibXML::Simple->new(%options)
39           Instantiate an object, which can be used to call XMLin() on.  You
40           can provide %options to this constructor (to be reused for each
41           call to XMLin) and with each call of XMLin (to be used once)
42
43           For descriptions of the %options see the "DETAILS" section of this
44           manual page.
45
46   Translators
47       $obj->XMLin($xmldata, %options)
48           For $xmldata and descriptions of the %options see the "DETAILS"
49           section of this manual page.
50

FUNCTIONS

52       The functions "XMLin" (exported implictly) and "xml_in" (exported on
53       request) simply call "<XML::LibXML::Simple-"new->XMLin() >> with the
54       provided parameters.
55

DETAILS

57   Parameter $xmldata
58       As first parameter to XMLin() must provide the XML message to be
59       translated into a Perl structure.  Choose one of the following:
60
61       A filename
62           If the filename contains no directory components, XMLin() will look
63           for the file in each directory in the SearchPath (see OPTIONS
64           below) and in the current directory.  eg:
65
66             $data = XMLin('/etc/params.xml', %options);
67
68       A dash  (-)
69           Parse from STDIN.
70
71             $data = XMLin('-', %options);
72
73       undef
74           [deprecated] If there is no XML specifier, XMLin() will check the
75           script directory and each of the SearchPath directories for a file
76           with the same name as the script but with the extension '.xml'.
77           Note: if you wish to specify options, you must specify the value
78           'undef'.  eg:
79
80             $data = XMLin(undef, ForceArray => 1);
81
82           This feature is available for backwards compatibility with
83           XML::Simple, but quite sensitive.  You can easily hit the wrong xml
84           file as input.  Please do not use it: always use an explicit
85           filename.
86
87       A string of XML
88           A string containing XML (recognised by the presence of '<' and '>'
89           characters) will be parsed directly.  eg:
90
91             $data = XMLin('<opt username="bob" password="flurp" />', %options);
92
93       An IO::Handle object
94           In this case, XML::LibXML::Parser will read the XML data directly
95           from the provided file.
96
97             # $fh = IO::File->new('/etc/params.xml') or die;
98             open my $fh, '<:encoding(utf8)', '/etc/params.xml' or die;
99
100             $data = XMLin($fh, %options);
101
102       An XML::LibXML::Document or ::Element
103           [Not available in XML::Simple] When you have a pre-parsed
104           XML::LibXML node, you can pass that.
105
106   Parameter %options
107       XML::LibXML::Simple supports most options defined by XML::Simple, so
108       the interface is quite compatible.  Minor changes apply.  This
109       explanation is extracted from the XML::Simple manual-page.
110
111       •   check out "ForceArray" because you'll almost certainly want to turn
112           it on
113
114       •   make sure you know what the "KeyAttr" option does and what its
115           default value is because it may surprise you otherwise.
116
117       •   Option names are case in-sensitive so you can use the mixed case
118           versions shown here; you can add underscores between the words (eg:
119           key_attr) if you like.
120
121       In alphabetic order:
122
123       ContentKey => 'keyname' # seldom used
124           When text content is parsed to a hash value, this option lets you
125           specify a name for the hash key to override the default 'content'.
126           So for example:
127
128             XMLin('<opt one="1">Two</opt>', ContentKey => 'text')
129
130           will parse to:
131
132             { one => 1, text => 'Two' }
133
134           instead of:
135
136             { one => 1, content => 'Two' }
137
138           You can also prefix your selected key name with a '-' character to
139           have XMLin() try a little harder to eliminate unnecessary 'content'
140           keys after array folding.  For example:
141
142             XMLin(
143               '<opt><item name="one">First</item><item name="two">Second</item></opt>',
144               KeyAttr => {item => 'name'},
145               ForceArray => [ 'item' ],
146               ContentKey => '-content'
147             )
148
149           will parse to:
150
151             {
152                item => {
153                 one =>  'First'
154                 two =>  'Second'
155               }
156             }
157
158           rather than this (without the '-'):
159
160             {
161               item => {
162                 one => { content => 'First' }
163                 two => { content => 'Second' }
164               }
165             }
166
167       ForceArray => 1 # important
168           This option should be set to '1' to force nested elements to be
169           represented as arrays even when there is only one.  Eg, with
170           ForceArray enabled, this XML:
171
172               <opt>
173                 <name>value</name>
174               </opt>
175
176           would parse to this:
177
178               { name => [ 'value' ] }
179
180           instead of this (the default):
181
182               { name => 'value' }
183
184           This option is especially useful if the data structure is likely to
185           be written back out as XML and the default behaviour of rolling
186           single nested elements up into attributes is not desirable.
187
188           If you are using the array folding feature, you should almost
189           certainly enable this option.  If you do not, single nested
190           elements will not be parsed to arrays and therefore will not be
191           candidates for folding to a hash.  (Given that the default value of
192           'KeyAttr' enables array folding, the default value of this option
193           should probably also have been enabled as well).
194
195       ForceArray => [ names ] # important
196           This alternative (and preferred) form of the 'ForceArray' option
197           allows you to specify a list of element names which should always
198           be forced into an array representation, rather than the 'all or
199           nothing' approach above.
200
201           It is also possible to include compiled regular expressions in the
202           list --any element names which match the pattern will be forced to
203           arrays.  If the list contains only a single regex, then it is not
204           necessary to enclose it in an arrayref.  Eg:
205
206             ForceArray => qr/_list$/
207
208       ForceContent => 1 # seldom used
209           When XMLin() parses elements which have text content as well as
210           attributes, the text content must be represented as a hash value
211           rather than a simple scalar.  This option allows you to force text
212           content to always parse to a hash value even when there are no
213           attributes.  So for example:
214
215             XMLin('<opt><x>text1</x><y a="2">text2</y></opt>', ForceContent => 1)
216
217           will parse to:
218
219             {
220               x => {         content => 'text1' },
221               y => { a => 2, content => 'text2' }
222             }
223
224           instead of:
225
226             {
227               x => 'text1',
228               y => { 'a' => 2, 'content' => 'text2' }
229             }
230
231       GroupTags => { grouping tag => grouped tag } # handy
232           You can use this option to eliminate extra levels of indirection in
233           your Perl data structure.  For example this XML:
234
235             <opt>
236              <searchpath>
237                <dir>/usr/bin</dir>
238                <dir>/usr/local/bin</dir>
239                <dir>/usr/X11/bin</dir>
240              </searchpath>
241            </opt>
242
243           Would normally be read into a structure like this:
244
245             {
246               searchpath => {
247                  dir => [ '/usr/bin', '/usr/local/bin', '/usr/X11/bin' ]
248               }
249             }
250
251           But when read in with the appropriate value for 'GroupTags':
252
253             my $opt = XMLin($xml, GroupTags => { searchpath => 'dir' });
254
255           It will return this simpler structure:
256
257             {
258               searchpath => [ '/usr/bin', '/usr/local/bin', '/usr/X11/bin' ]
259             }
260
261           The grouping element ("<searchpath>" in the example) must not
262           contain any attributes or elements other than the grouped element.
263
264           You can specify multiple 'grouping element' to 'grouped element'
265           mappings in the same hashref.  If this option is combined with
266           "KeyAttr", the array folding will occur first and then the grouped
267           element names will be eliminated.
268
269       HookNodes => CODE
270           Select document nodes to apply special tricks.  Introduced in
271           [0.96], not available in XML::Simple.
272
273           When this option is provided, the CODE will be called once the XML
274           DOM tree is ready to get transformed into Perl.  Your CODE should
275           return either "undef" (nothing to do) or a HASH which maps values
276           of unique_key (see XML::LibXML::Node method "unique_key" onto CODE
277           references to be called.
278
279           Once the translater from XML into Perl reaches a selected node, it
280           will call your routine specific for that node.  That triggering
281           node found is the only parameter.  When you return "undef", the
282           node will not be found in the final result.  You may return any
283           data (even the node itself) which will be included in the final
284           result as is, under the name of the original node.
285
286           Example:
287
288              my $out = XMLin $file, HookNodes => \&protect_html;
289
290              sub protect_html($$)
291              {   # $obj is the instantated XML::Compile::Simple object
292                  # $xml is a XML::LibXML::Element to get transformed
293                  my ($obj, $xml) = @_;
294
295                  my %hooks;    # collects the table of hooks
296
297                  # do an xpath search for HTML
298                  my $xpc   = XML::LibXML::XPathContext->new($xml);
299                  my @nodes = $xpc->findNodes(...); #XXX
300                  @nodes or return undef;
301
302                  my $as_text = sub { $_[0]->toString(0) };  # as text
303                  #  $as_node = sub { $_[0] };               # as node
304                  #  $skip    = sub { undef };               # not at all
305
306                  # the same behavior for all xpath nodes, in this example
307                  $hook{$_->unique_key} = $as_text
308                      for @nodes;
309
310                  \%hook;
311              }
312
313       KeepRoot => 1 # handy
314           In its attempt to return a data structure free of superfluous
315           detail and unnecessary levels of indirection, XMLin() normally
316           discards the root element name.  Setting the 'KeepRoot' option to
317           '1' will cause the root element name to be retained.  So after
318           executing this code:
319
320             $config = XMLin('<config tempdir="/tmp" />', KeepRoot => 1)
321
322           You'll be able to reference the tempdir as
323           "$config->{config}->{tempdir}" instead of the default
324           "$config->{tempdir}".
325
326       KeyAttr => [ list ] # important
327           This option controls the 'array folding' feature which translates
328           nested elements from an array to a hash.  It also controls the
329           'unfolding' of hashes to arrays.
330
331           For example, this XML:
332
333               <opt>
334                 <user login="grep" fullname="Gary R Epstein" />
335                 <user login="stty" fullname="Simon T Tyson" />
336               </opt>
337
338           would, by default, parse to this:
339
340               {
341                 user => [
342                    { login    => 'grep',
343                      fullname => 'Gary R Epstein'
344                    },
345                    { login    => 'stty',
346                      fullname => 'Simon T Tyson'
347                    }
348                 ]
349               }
350
351           If the option 'KeyAttr => "login"' were used to specify that the
352           'login' attribute is a key, the same XML would parse to:
353
354               {
355                 user => {
356                    stty => { fullname => 'Simon T Tyson' },
357                    grep => { fullname => 'Gary R Epstein' }
358                 }
359               }
360
361           The key attribute names should be supplied in an arrayref if there
362           is more than one.  XMLin() will attempt to match attribute names in
363           the order supplied.
364
365           Note 1: The default value for 'KeyAttr' is "['name', 'key', 'id']".
366           If you do not want folding on input or unfolding on output you must
367           setting this option to an empty list to disable the feature.
368
369           Note 2: If you wish to use this option, you should also enable the
370           "ForceArray" option.  Without 'ForceArray', a single nested element
371           will be rolled up into a scalar rather than an array and therefore
372           will not be folded (since only arrays get folded).
373
374       KeyAttr => { list } # important
375           This alternative (and preferred) method of specifying the key
376           attributes allows more fine grained control over which elements are
377           folded and on which attributes.  For example the option 'KeyAttr =>
378           { package => 'id' } will cause any package elements to be folded on
379           the 'id' attribute.  No other elements which have an 'id' attribute
380           will be folded at all.
381
382           Two further variations are made possible by prefixing a '+' or a
383           '-' character to the attribute name:
384
385           The option 'KeyAttr => { user => "+login" }' will cause this XML:
386
387               <opt>
388                 <user login="grep" fullname="Gary R Epstein" />
389                 <user login="stty" fullname="Simon T Tyson" />
390               </opt>
391
392           to parse to this data structure:
393
394               {
395                 user => {
396                    stty => {
397                       fullname => 'Simon T Tyson',
398                       login    => 'stty'
399                    },
400                    grep => {
401                       fullname => 'Gary R Epstein',
402                       login    => 'grep'
403                    }
404                 }
405               }
406
407           The '+' indicates that the value of the key attribute should be
408           copied rather than moved to the folded hash key.
409
410           A '-' prefix would produce this result:
411
412               {
413                 user => {
414                    stty => {
415                       fullname => 'Simon T Tyson',
416                       -login   => 'stty'
417                    },
418                    grep => {
419                       fullname => 'Gary R Epstein',
420                       -login    => 'grep'
421                    }
422                 }
423               }
424
425       NoAttr => 1 # handy
426           When used with XMLin(), any attributes in the XML will be ignored.
427
428       NormaliseSpace => 0 | 1 | 2 # handy
429           This option controls how whitespace in text content is handled.
430           Recognised values for the option are:
431
432           "0" (default) whitespace is passed through unaltered (except of
433               course for the normalisation of whitespace in attribute values
434               which is mandated by the XML recommendation)
435
436           "1" whitespace is normalised in any value used as a hash key
437               (normalising means removing leading and trailing whitespace and
438               collapsing sequences of whitespace characters to a single
439               space)
440
441           "2" whitespace is normalised in all text content
442
443           Note: you can spell this option with a 'z' if that is more natural
444           for you.
445
446       Parser => OBJECT
447           You may pass your own XML::LibXML object, in stead of having one
448           created for you. This is useful when you need specific
449           configuration on that object (See XML::LibXML::Parser) or have
450           implemented your own extension to that object.
451
452           The internally created parser object is configured in safe mode.
453           Read the XML::LibXML::Parser manual about security issues with
454           certain parameter settings.  The default is unsafe!
455
456       ParserOpts => HASH|ARRAY
457           Pass parameters to the creation of a new internal parser object.
458           You can overrule the options which will create a safe parser. It
459           may be more readible to use the "Parser" parameter.
460
461       SearchPath => [ list ] # handy
462           If you pass XMLin() a filename, but the filename include no
463           directory component, you can use this option to specify which
464           directories should be searched to locate the file.  You might use
465           this option to search first in the user's home directory, then in a
466           global directory such as /etc.
467
468           If a filename is provided to XMLin() but SearchPath is not defined,
469           the file is assumed to be in the current directory.
470
471           If the first parameter to XMLin() is undefined, the default
472           SearchPath will contain only the directory in which the script
473           itself is located.  Otherwise the default SearchPath will be empty.
474
475       SuppressEmpty => 1 | '' | undef
476           [0.99] What to do with empty elements (no attributes and no
477           content).  The default behaviour is to represent them as empty
478           hashes.  Setting this option to a true value (eg: 1) will cause
479           empty elements to be skipped altogether.  Setting the option to
480           'undef' or the empty string will cause empty elements to be
481           represented as the undefined value or the empty string
482           respectively.
483
484       ValueAttr => [ names ] # handy
485           Use this option to deal elements which always have a single
486           attribute and no content.  Eg:
487
488             <opt>
489               <colour value="red" />
490               <size   value="XXL" />
491             </opt>
492
493           Setting "ValueAttr => [ 'value' ]" will cause the above XML to
494           parse to:
495
496             {
497               colour => 'red',
498               size   => 'XXL'
499             }
500
501           instead of this (the default):
502
503             {
504               colour => { value => 'red' },
505               size   => { value => 'XXL' }
506             }
507
508       NsExpand => 0  advised
509           When name-spaces are used, the default behavior is to include the
510           prefix in the key name.  However, this is very dangerous: the
511           prefixes can be changed without a change of the XML message
512           meaning.  Therefore, you can better use this "NsExpand" option.
513           The downside, however, is that the labels get very long.
514
515           Without this option:
516
517             <record xmlns:x="http://xyz">
518               <x:field1>42</x:field1>
519             </record>
520             <record xmlns:y="http://xyz">
521               <y:field1>42</y:field1>
522             </record>
523
524           translates into
525
526             { 'x:field1' => 42 }
527             { 'y:field1' => 42 }
528
529           but both source component have exactly the same meaning.  When
530           "NsExpand" is used, the result is:
531
532             { '{http://xyz}field1' => 42 }
533             { '{http://xyz}field1' => 42 }
534
535           Of course, addressing these fields is more work.  It is advised to
536           implement it like this:
537
538             my $ns = 'http://xyz';
539             $data->{"{$ns}field1"};
540
541       NsStrip => 0 sloppy coding
542           [not available in XML::Simple] Namespaces are really important to
543           avoid name collissions, but they are a bit of a hassle.  To do it
544           correctly, use option "NsExpand".  To do it sloppy, use "NsStrip".
545           With this option set, the above example will return
546
547             { field1 => 42 }
548             { field1 => 42 }
549

EXAMPLES

551       When XMLin() reads the following very simple piece of XML:
552
553           <opt username="testuser" password="frodo"></opt>
554
555       it returns the following data structure:
556
557           {
558             username => 'testuser',
559             password => 'frodo'
560           }
561
562       The identical result could have been produced with this alternative
563       XML:
564
565           <opt username="testuser" password="frodo" />
566
567       Or this (although see 'ForceArray' option for variations):
568
569           <opt>
570             <username>testuser</username>
571             <password>frodo</password>
572           </opt>
573
574       Repeated nested elements are represented as anonymous arrays:
575
576           <opt>
577             <person firstname="Joe" lastname="Smith">
578               <email>joe@smith.com</email>
579               <email>jsmith@yahoo.com</email>
580             </person>
581             <person firstname="Bob" lastname="Smith">
582               <email>bob@smith.com</email>
583             </person>
584           </opt>
585
586           {
587             person => [
588               { email     => [ 'joe@smith.com', 'jsmith@yahoo.com' ],
589                 firstname => 'Joe',
590                 lastname  => 'Smith'
591               },
592               { email     => 'bob@smith.com',
593                 firstname => 'Bob',
594                 lastname  => 'Smith'
595               }
596             ]
597           }
598
599       Nested elements with a recognised key attribute are transformed
600       (folded) from an array into a hash keyed on the value of that attribute
601       (see the "KeyAttr" option):
602
603           <opt>
604             <person key="jsmith" firstname="Joe" lastname="Smith" />
605             <person key="tsmith" firstname="Tom" lastname="Smith" />
606             <person key="jbloggs" firstname="Joe" lastname="Bloggs" />
607           </opt>
608
609           {
610             person => {
611                jbloggs => {
612                   firstname => 'Joe',
613                   lastname  => 'Bloggs'
614                },
615                tsmith  => {
616                   firstname => 'Tom',
617                   lastname  => 'Smith'
618                },
619                jsmith => {
620                   firstname => 'Joe',
621                   lastname => 'Smith'
622                }
623             }
624           }
625
626       The <anon> tag can be used to form anonymous arrays:
627
628           <opt>
629             <head><anon>Col 1</anon><anon>Col 2</anon><anon>Col 3</anon></head>
630             <data><anon>R1C1</anon><anon>R1C2</anon><anon>R1C3</anon></data>
631             <data><anon>R2C1</anon><anon>R2C2</anon><anon>R2C3</anon></data>
632             <data><anon>R3C1</anon><anon>R3C2</anon><anon>R3C3</anon></data>
633           </opt>
634
635           {
636             head => [ [ 'Col 1', 'Col 2', 'Col 3' ] ],
637             data => [ [ 'R1C1', 'R1C2', 'R1C3' ],
638                       [ 'R2C1', 'R2C2', 'R2C3' ],
639                       [ 'R3C1', 'R3C2', 'R3C3' ]
640                     ]
641           }
642
643       Anonymous arrays can be nested to arbirtrary levels and as a special
644       case, if the surrounding tags for an XML document contain only an
645       anonymous array the arrayref will be returned directly rather than the
646       usual hashref:
647
648           <opt>
649             <anon><anon>Col 1</anon><anon>Col 2</anon></anon>
650             <anon><anon>R1C1</anon><anon>R1C2</anon></anon>
651             <anon><anon>R2C1</anon><anon>R2C2</anon></anon>
652           </opt>
653
654           [
655             [ 'Col 1', 'Col 2' ],
656             [ 'R1C1', 'R1C2' ],
657             [ 'R2C1', 'R2C2' ]
658           ]
659
660       Elements which only contain text content will simply be represented as
661       a scalar.  Where an element has both attributes and text content, the
662       element will be represented as a hashref with the text content in the
663       'content' key (see the "ContentKey" option):
664
665         <opt>
666           <one>first</one>
667           <two attr="value">second</two>
668         </opt>
669
670         {
671           one => 'first',
672           two => { attr => 'value', content => 'second' }
673         }
674
675       Mixed content (elements which contain both text content and nested
676       elements) will be not be represented in a useful way - element order
677       and significant whitespace will be lost.  If you need to work with
678       mixed content, then XML::Simple is not the right tool for your job -
679       check out the next section.
680
681   Differences to XML::Simple
682       In general, the output and the options are equivalent, although this
683       module has some differences with XML::Simple to be aware of.
684
685       only XMLin() is supported
686           If you want to write XML then use a schema (for instance with
687           XML::Compile). Do not attempt to create XML by hand!  If you still
688           think you need it, then have a look at XMLout() as implemented by
689           XML::Simple or any of a zillion template systems.
690
691       no "variables" option
692           IMO, you should use a templating system if you want variables
693           filled-in in the input: it is not a task for this module.
694
695       ForceArray options
696           There are a few small differences in the result of the "forcearray"
697           option, because XML::Simple seems to behave inconsequently.
698
699       hooks
700           XML::Simple does not support hooks.
701

SEE ALSO

703       XML::Compile for processing XML when a schema is available.  When you
704       have a schema, the data and structure of your message get validated.
705
706       XML::Simple, the original implementation which interface is followed as
707       closely as possible.
708

COPYRIGHTS

710       The interface design and large parts of the documentation were taken
711       from the XML::Simple module, written by Grant McLean <grantm@cpan.org>
712
713       Copyrights of the perl code and the related documentation by 2008-2020
714       by [Mark Overmeer <markov@cpan.org>]. For other contributors see
715       ChangeLog.
716
717       This program is free software; you can redistribute it and/or modify it
718       under the same terms as Perl itself.  See http://dev.perl.org/licenses/
719
720
721
722perl v5.38.0                      2023-07-21            XML::LibXML::Simple(3)
Impressum