1XML::LibXML::Simple(3)User Contributed Perl DocumentationXML::LibXML::Simple(3)
2
3
4

NAME

6       XML::LibXML::Simple - XML::LibXML clone of XML::Simple::XMLin()
7

INHERITANCE

9        XML::LibXML::Simple
10          is a Exporter
11

SYNOPSIS

13         my $xml  = ...;  # filename, fh, string, or XML::LibXML-node
14
15       Imperative:
16
17         use XML::LibXML::Simple   qw(XMLin);
18         my $data = XMLin $xml, %options;
19
20       Or the Object Oriented way:
21
22         use XML::LibXML::Simple   ();
23         my $xs   = XML::LibXML::Simple->new(%options);
24         my $data = $xs->XMLin($xml, %options);
25

DESCRIPTION

27       This module is a blunt rewrite of XML::Simple (by Grant McLean) to use
28       the XML::LibXML parser for XML structures, where the original uses
29       plain Perl or SAX parsers.
30
31       Be warned: this module thinks to be smart.  You may very well shoot
32       yourself in the foot with this DWIMmery.  Read the whole manual page at
33       least once before you start using it.  If your XML is described in a
34       schema or WSDL, then use XML::Compile for maintainable code.
35

METHODS

37   Constructors
38       XML::LibXML::Simple->new(%options)
39           Instantiate an object, which can be used to call XMLin() on.  You
40           can provide %options to this constructor (to be reused for each
41           call to XMLin) and with each call of XMLin (to be used once)
42
43           For descriptions of the %options see the "DETAILS" section of this
44           manual page.
45
46   Translators
47       $obj->XMLin($xmldata, %options)
48           For $xmldata and descriptions of the %options see the "DETAILS"
49           section of this manual page.
50

FUNCTIONS

52       The functions "XMLin" (exported implictly) and "xml_in" (exported on
53       request) simply call "<XML::LibXML::Simple-"new->XMLin() >> with the
54       provided parameters.
55

DETAILS

57   Parameter $xmldata
58       As first parameter to XMLin() must provide the XML message to be
59       translated into a Perl structure.  Choose one of the following:
60
61       A filename
62           If the filename contains no directory components, "XMLin()" will
63           look for the file in each directory in the SearchPath (see OPTIONS
64           below) and in the current directory.  eg:
65
66             $data = XMLin('/etc/params.xml', %options);
67
68       A dash  (-)
69           Parse from STDIN.
70
71             $data = XMLin('-', %options);
72
73       undef
74           [deprecated] If there is no XML specifier, "XMLin()" will check the
75           script directory and each of the SearchPath directories for a file
76           with the same name as the script but with the extension '.xml'.
77           Note: if you wish to specify options, you must specify the value
78           'undef'.  eg:
79
80             $data = XMLin(undef, ForceArray => 1);
81
82           This feature is available for backwards compatibility with
83           XML::Simple, but quite sensitive.  You can easily hit the wrong xml
84           file as input.  Please do not use it: always use an explicit
85           filename.
86
87       A string of XML
88           A string containing XML (recognised by the presence of '<' and '>'
89           characters) will be parsed directly.  eg:
90
91             $data = XMLin('<opt username="bob" password="flurp" />', %options);
92
93       An IO::Handle object
94           In this case, XML::LibXML::Parser will read the XML data directly
95           from the provided file.
96
97             # $fh = IO::File->new('/etc/params.xml') or die;
98             open my $fh, '<:encoding(utf8)', '/etc/params.xml' or die;
99
100             $data = XMLin($fh, %options);
101
102       An XML::LibXML::Document or ::Element
103           [Not available in XML::Simple] When you have a pre-parsed
104           XML::LibXML node, you can pass that.
105
106   Parameter %options
107       XML::LibXML::Simple supports most options defined by XML::Simple, so
108       the interface is quite compatible.  Minor changes apply.  This
109       explanation is extracted from the XML::Simple manual-page.
110
111       ·   check out "ForceArray" because you'll almost certainly want to turn
112           it on
113
114       ·   make sure you know what the "KeyAttr" option does and what its
115           default value is because it may surprise you otherwise.
116
117       ·   Option names are case in-sensitive so you can use the mixed case
118           versions shown here; you can add underscores between the words (eg:
119           key_attr) if you like.
120
121       In alphabetic order:
122
123       ContentKey => 'keyname' # seldom used
124           When text content is parsed to a hash value, this option lets you
125           specify a name for the hash key to override the default 'content'.
126           So for example:
127
128             XMLin('<opt one="1">Two</opt>', ContentKey => 'text')
129
130           will parse to:
131
132             { one => 1, text => 'Two' }
133
134           instead of:
135
136             { one => 1, content => 'Two' }
137
138           You can also prefix your selected key name with a '-' character to
139           have "XMLin()" try a little harder to eliminate unnecessary
140           'content' keys after array folding.  For example:
141
142             XMLin(
143               '<opt><item name="one">First</item><item name="two">Second</item></opt>',
144               KeyAttr => {item => 'name'},
145               ForceArray => [ 'item' ],
146               ContentKey => '-content'
147             )
148
149           will parse to:
150
151             {
152                item => {
153                 one =>  'First'
154                 two =>  'Second'
155               }
156             }
157
158           rather than this (without the '-'):
159
160             {
161               item => {
162                 one => { content => 'First' }
163                 two => { content => 'Second' }
164               }
165             }
166
167       ForceArray => 1 # important
168           This option should be set to '1' to force nested elements to be
169           represented as arrays even when there is only one.  Eg, with
170           ForceArray enabled, this XML:
171
172               <opt>
173                 <name>value</name>
174               </opt>
175
176           would parse to this:
177
178               { name => [ 'value' ] }
179
180           instead of this (the default):
181
182               { name => 'value' }
183
184           This option is especially useful if the data structure is likely to
185           be written back out as XML and the default behaviour of rolling
186           single nested elements up into attributes is not desirable.
187
188           If you are using the array folding feature, you should almost
189           certainly enable this option.  If you do not, single nested
190           elements will not be parsed to arrays and therefore will not be
191           candidates for folding to a hash.  (Given that the default value of
192           'KeyAttr' enables array folding, the default value of this option
193           should probably also have been enabled as well).
194
195       ForceArray => [ names ] # important
196           This alternative (and preferred) form of the 'ForceArray' option
197           allows you to specify a list of element names which should always
198           be forced into an array representation, rather than the 'all or
199           nothing' approach above.
200
201           It is also possible to include compiled regular expressions in the
202           list --any element names which match the pattern will be forced to
203           arrays.  If the list contains only a single regex, then it is not
204           necessary to enclose it in an arrayref.  Eg:
205
206             ForceArray => qr/_list$/
207
208       ForceContent => 1 # seldom used
209           When "XMLin()" parses elements which have text content as well as
210           attributes, the text content must be represented as a hash value
211           rather than a simple scalar.  This option allows you to force text
212           content to always parse to a hash value even when there are no
213           attributes.  So for example:
214
215             XMLin('<opt><x>text1</x><y a="2">text2</y></opt>', ForceContent => 1)
216
217           will parse to:
218
219             {
220               x => {         content => 'text1' },
221               y => { a => 2, content => 'text2' }
222             }
223
224           instead of:
225
226             {
227               x => 'text1',
228               y => { 'a' => 2, 'content' => 'text2' }
229             }
230
231       GroupTags => { grouping tag => grouped tag } # handy
232           You can use this option to eliminate extra levels of indirection in
233           your Perl data structure.  For example this XML:
234
235             <opt>
236              <searchpath>
237                <dir>/usr/bin</dir>
238                <dir>/usr/local/bin</dir>
239                <dir>/usr/X11/bin</dir>
240              </searchpath>
241            </opt>
242
243           Would normally be read into a structure like this:
244
245             {
246               searchpath => {
247                  dir => [ '/usr/bin', '/usr/local/bin', '/usr/X11/bin' ]
248               }
249             }
250
251           But when read in with the appropriate value for 'GroupTags':
252
253             my $opt = XMLin($xml, GroupTags => { searchpath => 'dir' });
254
255           It will return this simpler structure:
256
257             {
258               searchpath => [ '/usr/bin', '/usr/local/bin', '/usr/X11/bin' ]
259             }
260
261           The grouping element ("<searchpath>" in the example) must not
262           contain any attributes or elements other than the grouped element.
263
264           You can specify multiple 'grouping element' to 'grouped element'
265           mappings in the same hashref.  If this option is combined with
266           "KeyAttr", the array folding will occur first and then the grouped
267           element names will be eliminated.
268
269       HookNodes => CODE
270           Select document nodes to apply special tricks.  Introduced in
271           [0.96], not available in XML::Simple.
272
273           When this option is provided, the CODE will be called once the XML
274           DOM tree is ready to get transformed into Perl.  Your CODE should
275           return either "undef" (nothing to do) or a HASH which maps values
276           of unique_key (see XML::LibXML::Node method "unique_key" onto CODE
277           references to be called.
278
279           Once the translater from XML into Perl reaches a selected node, it
280           will call your routine specific for that node.  That triggering
281           node found is the only parameter.  When you return "undef", the
282           node will not be found in the final result.  You may return any
283           data (even the node itself) which will be included in the final
284           result as is, under the name of the original node.
285
286           Example:
287
288              my $out = XMLin $file, HookNodes => \&protect_html;
289
290              sub protect_html($$)
291              {   # $obj is the instantated XML::Compile::Simple object
292                  # $xml is a XML::LibXML::Element to get transformed
293                  my ($obj, $xml) = @_;
294
295                  my %hooks;    # collects the table of hooks
296
297                  # do an xpath search for HTML
298                  my $xpc   = XML::LibXML::XPathContext->new($xml);
299                  my @nodes = $xpc->findNodes(...); #XXX
300                  @nodes or return undef;
301
302                  my $as_text = sub { $_[0]->toString(0) };  # as text
303                  #  $as_node = sub { $_[0] };               # as node
304                  #  $skip    = sub { undef };               # not at all
305
306                  # the same behavior for all xpath nodes, in this example
307                  $hook{$_->unique_key} = $as_text
308                      for @nodes;
309
310                  \%hook;
311              }
312
313       KeepRoot => 1 # handy
314           In its attempt to return a data structure free of superfluous
315           detail and unnecessary levels of indirection, "XMLin()" normally
316           discards the root element name.  Setting the 'KeepRoot' option to
317           '1' will cause the root element name to be retained.  So after
318           executing this code:
319
320             $config = XMLin('<config tempdir="/tmp" />', KeepRoot => 1)
321
322           You'll be able to reference the tempdir as
323           "$config->{config}->{tempdir}" instead of the default
324           "$config->{tempdir}".
325
326       KeyAttr => [ list ] # important
327           This option controls the 'array folding' feature which translates
328           nested elements from an array to a hash.  It also controls the
329           'unfolding' of hashes to arrays.
330
331           For example, this XML:
332
333               <opt>
334                 <user login="grep" fullname="Gary R Epstein" />
335                 <user login="stty" fullname="Simon T Tyson" />
336               </opt>
337
338           would, by default, parse to this:
339
340               {
341                 user => [
342                    { login    => 'grep',
343                      fullname => 'Gary R Epstein'
344                    },
345                    { login    => 'stty',
346                      fullname => 'Simon T Tyson'
347                    }
348                 ]
349               }
350
351           If the option 'KeyAttr => "login"' were used to specify that the
352           'login' attribute is a key, the same XML would parse to:
353
354               {
355                 user => {
356                    stty => { fullname => 'Simon T Tyson' },
357                    grep => { fullname => 'Gary R Epstein' }
358                 }
359               }
360
361           The key attribute names should be supplied in an arrayref if there
362           is more than one.  "XMLin()" will attempt to match attribute names
363           in the order supplied.
364
365           Note 1: The default value for 'KeyAttr' is "['name', 'key', 'id']".
366           If you do not want folding on input or unfolding on output you must
367           setting this option to an empty list to disable the feature.
368
369           Note 2: If you wish to use this option, you should also enable the
370           "ForceArray" option.  Without 'ForceArray', a single nested element
371           will be rolled up into a scalar rather than an array and therefore
372           will not be folded (since only arrays get folded).
373
374       KeyAttr => { list } # important
375           This alternative (and preferred) method of specifying the key
376           attributes allows more fine grained control over which elements are
377           folded and on which attributes.  For example the option 'KeyAttr =>
378           { package => 'id' } will cause any package elements to be folded on
379           the 'id' attribute.  No other elements which have an 'id' attribute
380           will be folded at all.
381
382           Two further variations are made possible by prefixing a '+' or a
383           '-' character to the attribute name:
384
385           The option 'KeyAttr => { user => "+login" }' will cause this XML:
386
387               <opt>
388                 <user login="grep" fullname="Gary R Epstein" />
389                 <user login="stty" fullname="Simon T Tyson" />
390               </opt>
391
392           to parse to this data structure:
393
394               {
395                 user => {
396                    stty => {
397                       fullname => 'Simon T Tyson',
398                       login    => 'stty'
399                    },
400                    grep => {
401                       fullname => 'Gary R Epstein',
402                       login    => 'grep'
403                    }
404                 }
405               }
406
407           The '+' indicates that the value of the key attribute should be
408           copied rather than moved to the folded hash key.
409
410           A '-' prefix would produce this result:
411
412               {
413                 user => {
414                    stty => {
415                       fullname => 'Simon T Tyson',
416                       -login   => 'stty'
417                    },
418                    grep => {
419                       fullname => 'Gary R Epstein',
420                       -login    => 'grep'
421                    }
422                 }
423               }
424
425       NoAttr => 1 # handy
426           When used with "XMLin()", any attributes in the XML will be
427           ignored.
428
429       NormaliseSpace => 0 | 1 | 2 # handy
430           This option controls how whitespace in text content is handled.
431           Recognised values for the option are:
432
433           "0" (default) whitespace is passed through unaltered (except of
434               course for the normalisation of whitespace in attribute values
435               which is mandated by the XML recommendation)
436
437           "1" whitespace is normalised in any value used as a hash key
438               (normalising means removing leading and trailing whitespace and
439               collapsing sequences of whitespace characters to a single
440               space)
441
442           "2" whitespace is normalised in all text content
443
444           Note: you can spell this option with a 'z' if that is more natural
445           for you.
446
447       Parser => OBJECT
448           You may pass your own XML::LibXML object, in stead of having one
449           created for you. This is useful when you need specific
450           configuration on that object (See XML::LibXML::Parser) or have
451           implemented your own extension to that object.
452
453           The internally created parser object is configured in safe mode.
454           Read the XML::LibXML::Parser manual about security issues with
455           certain parameter settings.  The default is unsafe!
456
457       ParserOpts => HASH|ARRAY
458           Pass parameters to the creation of a new internal parser object.
459           You can overrule the options which will create a safe parser. It
460           may be more readible to use the "Parser" parameter.
461
462       SearchPath => [ list ] # handy
463           If you pass "XMLin()" a filename, but the filename include no
464           directory component, you can use this option to specify which
465           directories should be searched to locate the file.  You might use
466           this option to search first in the user's home directory, then in a
467           global directory such as /etc.
468
469           If a filename is provided to "XMLin()" but SearchPath is not
470           defined, the file is assumed to be in the current directory.
471
472           If the first parameter to "XMLin()" is undefined, the default
473           SearchPath will contain only the directory in which the script
474           itself is located.  Otherwise the default SearchPath will be empty.
475
476       SuppressEmpty => 1 | '' | undef
477           [0.99] What to do with empty elements (no attributes and no
478           content).  The default behaviour is to represent them as empty
479           hashes.  Setting this option to a true value (eg: 1) will cause
480           empty elements to be skipped altogether.  Setting the option to
481           'undef' or the empty string will cause empty elements to be
482           represented as the undefined value or the empty string
483           respectively.
484
485       ValueAttr => [ names ] # handy
486           Use this option to deal elements which always have a single
487           attribute and no content.  Eg:
488
489             <opt>
490               <colour value="red" />
491               <size   value="XXL" />
492             </opt>
493
494           Setting "ValueAttr => [ 'value' ]" will cause the above XML to
495           parse to:
496
497             {
498               colour => 'red',
499               size   => 'XXL'
500             }
501
502           instead of this (the default):
503
504             {
505               colour => { value => 'red' },
506               size   => { value => 'XXL' }
507             }
508
509       NsExpand => 0  advised
510           When name-spaces are used, the default behavior is to include the
511           prefix in the key name.  However, this is very dangerous: the
512           prefixes can be changed without a change of the XML message
513           meaning.  Therefore, you can better use this "NsExpand" option.
514           The downside, however, is that the labels get very long.
515
516           Without this option:
517
518             <record xmlns:x="http://xyz">
519               <x:field1>42</x:field1>
520             </record>
521             <record xmlns:y="http://xyz">
522               <y:field1>42</y:field1>
523             </record>
524
525           translates into
526
527             { 'x:field1' => 42 }
528             { 'y:field1' => 42 }
529
530           but both source component have exactly the same meaning.  When
531           "NsExpand" is used, the result is:
532
533             { '{http://xyz}field1' => 42 }
534             { '{http://xyz}field1' => 42 }
535
536           Of course, addressing these fields is more work.  It is advised to
537           implement it like this:
538
539             my $ns = 'http://xyz';
540             $data->{"{$ns}field1"};
541
542       NsStrip => 0 sloppy coding
543           [not available in XML::Simple] Namespaces are really important to
544           avoid name collissions, but they are a bit of a hassle.  To do it
545           correctly, use option "NsExpand".  To do it sloppy, use "NsStrip".
546           With this option set, the above example will return
547
548             { field1 => 42 }
549             { field1 => 42 }
550

EXAMPLES

552       When "XMLin()" reads the following very simple piece of XML:
553
554           <opt username="testuser" password="frodo"></opt>
555
556       it returns the following data structure:
557
558           {
559             username => 'testuser',
560             password => 'frodo'
561           }
562
563       The identical result could have been produced with this alternative
564       XML:
565
566           <opt username="testuser" password="frodo" />
567
568       Or this (although see 'ForceArray' option for variations):
569
570           <opt>
571             <username>testuser</username>
572             <password>frodo</password>
573           </opt>
574
575       Repeated nested elements are represented as anonymous arrays:
576
577           <opt>
578             <person firstname="Joe" lastname="Smith">
579               <email>joe@smith.com</email>
580               <email>jsmith@yahoo.com</email>
581             </person>
582             <person firstname="Bob" lastname="Smith">
583               <email>bob@smith.com</email>
584             </person>
585           </opt>
586
587           {
588             person => [
589               { email     => [ 'joe@smith.com', 'jsmith@yahoo.com' ],
590                 firstname => 'Joe',
591                 lastname  => 'Smith'
592               },
593               { email     => 'bob@smith.com',
594                 firstname => 'Bob',
595                 lastname  => 'Smith'
596               }
597             ]
598           }
599
600       Nested elements with a recognised key attribute are transformed
601       (folded) from an array into a hash keyed on the value of that attribute
602       (see the "KeyAttr" option):
603
604           <opt>
605             <person key="jsmith" firstname="Joe" lastname="Smith" />
606             <person key="tsmith" firstname="Tom" lastname="Smith" />
607             <person key="jbloggs" firstname="Joe" lastname="Bloggs" />
608           </opt>
609
610           {
611             person => {
612                jbloggs => {
613                   firstname => 'Joe',
614                   lastname  => 'Bloggs'
615                },
616                tsmith  => {
617                   firstname => 'Tom',
618                   lastname  => 'Smith'
619                },
620                jsmith => {
621                   firstname => 'Joe',
622                   lastname => 'Smith'
623                }
624             }
625           }
626
627       The <anon> tag can be used to form anonymous arrays:
628
629           <opt>
630             <head><anon>Col 1</anon><anon>Col 2</anon><anon>Col 3</anon></head>
631             <data><anon>R1C1</anon><anon>R1C2</anon><anon>R1C3</anon></data>
632             <data><anon>R2C1</anon><anon>R2C2</anon><anon>R2C3</anon></data>
633             <data><anon>R3C1</anon><anon>R3C2</anon><anon>R3C3</anon></data>
634           </opt>
635
636           {
637             head => [ [ 'Col 1', 'Col 2', 'Col 3' ] ],
638             data => [ [ 'R1C1', 'R1C2', 'R1C3' ],
639                       [ 'R2C1', 'R2C2', 'R2C3' ],
640                       [ 'R3C1', 'R3C2', 'R3C3' ]
641                     ]
642           }
643
644       Anonymous arrays can be nested to arbirtrary levels and as a special
645       case, if the surrounding tags for an XML document contain only an
646       anonymous array the arrayref will be returned directly rather than the
647       usual hashref:
648
649           <opt>
650             <anon><anon>Col 1</anon><anon>Col 2</anon></anon>
651             <anon><anon>R1C1</anon><anon>R1C2</anon></anon>
652             <anon><anon>R2C1</anon><anon>R2C2</anon></anon>
653           </opt>
654
655           [
656             [ 'Col 1', 'Col 2' ],
657             [ 'R1C1', 'R1C2' ],
658             [ 'R2C1', 'R2C2' ]
659           ]
660
661       Elements which only contain text content will simply be represented as
662       a scalar.  Where an element has both attributes and text content, the
663       element will be represented as a hashref with the text content in the
664       'content' key (see the "ContentKey" option):
665
666         <opt>
667           <one>first</one>
668           <two attr="value">second</two>
669         </opt>
670
671         {
672           one => 'first',
673           two => { attr => 'value', content => 'second' }
674         }
675
676       Mixed content (elements which contain both text content and nested
677       elements) will be not be represented in a useful way - element order
678       and significant whitespace will be lost.  If you need to work with
679       mixed content, then XML::Simple is not the right tool for your job -
680       check out the next section.
681
682   Differences to XML::Simple
683       In general, the output and the options are equivalent, although this
684       module has some differences with XML::Simple to be aware of.
685
686       only XMLin() is supported
687           If you want to write XML then use a schema (for instance with
688           XML::Compile). Do not attempt to create XML by hand!  If you still
689           think you need it, then have a look at XMLout() as implemented by
690           XML::Simple or any of a zillion template systems.
691
692       no "variables" option
693           IMO, you should use a templating system if you want variables
694           filled-in in the input: it is not a task for this module.
695
696       ForceArray options
697           There are a few small differences in the result of the "forcearray"
698           option, because XML::Simple seems to behave inconsequently.
699
700       hooks
701           XML::Simple does not support hooks.
702

SEE ALSO

704       XML::Compile for processing XML when a schema is available.  When you
705       have a schema, the data and structure of your message get validated.
706
707       XML::Simple, the original implementation which interface is followed as
708       closely as possible.
709

COPYRIGHTS

711       The interface design and large parts of the documentation were taken
712       from the XML::Simple module, written by Grant McLean <grantm@cpan.org>
713
714       Copyrights of the perl code and the related documentation by 2008-2020
715       by [Mark Overmeer <markov@cpan.org>]. For other contributors see
716       ChangeLog.
717
718       This program is free software; you can redistribute it and/or modify it
719       under the same terms as Perl itself.  See http://dev.perl.org/licenses/
720
721
722
723perl v5.30.1                      2020-01-30            XML::LibXML::Simple(3)
Impressum