1XML::Smart(3)         User Contributed Perl Documentation        XML::Smart(3)
2
3
4

NAME

6       XML::Smart - A smart, easy and powerful way to access or create XML
7       from fiels, data and URLs.
8

VERSION

10       Version 1.79
11

SYNOPSIS

13       This module provides an easy way to access/create XML data. It's based
14       on a HASH tree created from the XML data, and enables dynamic access to
15       it through the standard Perl syntax for Hash and Array, without
16       necessarily caring about which you are working with. In other words,
17       each point in the tree works as a Hash and an Array at the same time!
18
19       This module additionally provides special resources such as: search for
20       nodes by attribute, select an attribute value in each multiple node,
21       change the returned format, and so on.
22
23       The module also automatically handles binary data (encoding/decoding
24       to/from base64), CDATA (like contents with <tags>) and Unicode. It can
25       be used to create XML files, load XML from the Web ( just by using an
26       URL as the file path ) and has an easy way to send XML data through
27       sockets - just adding the length of the data in the <?xml?> header.
28
29       You can use XML::Smart with XML::Parser, or with the 2 standard parsers
30       of XML::Smart:
31
32       XML::Smart::Parser
33       XML::Smart::HTMLParser.
34
35       XML::Smart::HTMLParser can be used to load/parse wild/bad XML data, or
36       HTML tags.
37

Tutorial and F.A.Q.

39       You can find some extra documents about XML::Smart at:
40
41       XML::Smart::Tutorial - Tutorial and examples for XML::Smart.
42       XML::Smart::FAQ      - Frequently Asked Questions about XML::Smart.
43

USAGE

45         ## Create the object and load the file:
46         my $XML = XML::Smart->new('file.xml') ;
47
48         ## Force the use of the parser 'XML::Smart::Parser'.
49         my $XML = XML::Smart->new('file.xml' , 'XML::Smart::Parser') ;
50
51         ## Get from the web:
52         my $XML = XML::Smart->new('http://www.perlmonks.org/index.pl?node_id=16046') ;
53
54         ## Cut the root:
55         $XML = $XML->cut_root ;
56
57         ## Or change the root:
58         $XML = $XML->{hosts} ;
59
60         ## Get the address [0] of server [0]:
61         my $srv0_addr0 = $XML->{server}[0]{address}[0] ;
62         ## ...or...
63         my $srv0_addr0 = $XML->{server}{address} ;
64
65         ## Get the server where the attibute 'type' eq 'suse':
66         my $server = $XML->{server}('type','eq','suse') ;
67
68         ## Get the address again:
69         my $addr1 = $server->{address}[1] ;
70         ## ...or...
71         my $addr1 = $XML->{server}('type','eq','suse'){address}[1] ;
72
73         ## Get all the addresses of a server:
74         my @addrs = @{$XML->{server}{address}} ;
75         ## ...or...
76         my @addrs = $XML->{server}{address}('@') ;
77
78         ## Get a list of types of all the servers:
79         my @types = $XML->{server}('[@]','type') ;
80
81         ## Add a new server node:
82         my $newsrv = {
83         os      => 'Linux' ,
84         type    => 'Mandrake' ,
85         version => 8.9 ,
86         address => [qw(192.168.3.201 192.168.3.202)]
87         } ;
88
89         push(@{$XML->{server}} , $newsrv) ;
90
91         ## Get/rebuild the XML data:
92         my $xmldata = $XML->data ;
93
94         ## Save in some file:
95         $XML->save('newfile.xml') ;
96
97         ## Send through a socket:
98         print $socket $XML->data(length => 1) ; ## show the 'length' in the XML header to the
99                                                 ## socket know the amount of data to read.
100
101         __DATA__
102         <?xml version="1.0" encoding="iso-8859-1"?>
103         <hosts>
104           <server os="linux" type="redhat" version="8.0">
105             <address>192.168.0.1</address>
106             <address>192.168.0.2</address>
107           </server>
108           <server os="linux" type="suse" version="7.0">
109             <address>192.168.1.10</address>
110             <address>192.168.1.20</address>
111           </server>
112           <server address="192.168.2.100" os="linux" type="conectiva" version="9.0"/>
113         </hosts>
114

METHODS

116   new (FILE|DATA|URL , PARSER , OPTIONS)
117       Create a XML object.
118
119       Arguments:
120
121       FILE|DATA|URL
122                 The first argument can be:
123
124                   - XML data as string.
125                   - File path.
126                   - File Handle (GLOB).
127                   - URL (Need LWP::UserAgent).
128
129                 If not passed, a null XML tree is started, where you should
130                 create your own XML data, than build/save/send it.
131
132       PARSER (optional)
133                 Set the XML parser to use. Options:
134
135                   XML::Parser
136                   XML::Smart::Parser
137                   XML::Smart::HTMLParser
138
139                 XML::Smart::Parser can only handle basic XML data (not
140                 supported PCDATA, and any header like: ENTITY, NOTATION,
141                 etc...), but is a good choice when you don't want to install
142                 big modules to parse XML, since it comes with the main
143                 module. But it still can handle CDATA and binary data.
144
145                 ** See "PARSING HTML as XML" for XML::Smart::HTMLParser.
146
147                 Aliases for the options:
148
149                   SMART|REGEXP   => XML::Smart::Parser
150                   HTML           => XML::Smart::HTMLParser
151
152                 Default:
153
154                 If not set it will look for XML::Parser and load it.  If
155                 XML::Parser can't be loaded it will use XML::Smart::Parser,
156                 which is actually a clone of XML::Parser::Lite with some
157                 fixes.
158
159       OPTIONS   You can force the uper case and lower case for tags (nodes)
160                 and arguments (attributes), and other extra things.
161
162                 lowtag    Make the tags lower case.
163
164                 lowarg    Make the arguments lower case.
165
166                 upertag   Make the tags uper case.
167
168                 uperarg   Make the arguments uper case.
169
170                 arg_single
171                           Set the value of arguments to 1 when they have a
172                           undef value.
173
174                           ** This option will work only when the XML is
175                           parsed by XML::Smart::HTMLParser, since it accept
176                           arguments without values:
177
178                             my $xml = new XML::Smart(
179                             '<root><foo arg1="" flag></root>' ,
180                             'XML::Smart::HTMLParser' ,
181                             arg_single => 1 ,
182                             ) ;
183
184                           In this example the option "arg_single" was used,
185                           what will define flag to 1, but arg1 will still
186                           have a null string value ("").
187
188                           Here's the tree of the example above:
189
190                             'root' => {
191                                         'foo' => {
192                                                    'flag' => 1,
193                                                    'arg1' => ''
194                                                  },
195                                       },
196
197                 use_spaces
198                           Accept contents that have only spaces.
199
200                 on_start (CODE) *optional
201                           Code/sub to call on start a tag.
202
203                           ** This will be called after XML::Smart parse the
204                           tag, should be used only if you want to change the
205                           tree.
206
207                 on_char (CODE) *optional
208                           Code/sub to call on content.
209
210                           ** This will be called after XML::Smart parse the
211                           tag, should be used only if you want to change the
212                           tree.
213
214                 on_end (CODE) *optional
215                           Code/sub to call on end a tag.
216
217                           ** This will be called after XML::Smart parse the
218                           tag, should be used only if you want to change the
219                           tree.
220
221                 ** This options are applied when the XML data is loaded. For
222                 XML generation see data() OPTIONS.
223
224       Examples of use:
225
226         my $xml_from_url = XML::Smart->new("http://www.perlmonks.org/index.pl?node_id=16046") ;
227
228         ...
229
230         my $xml_from_str = XML::Smart->new(q`<?xml version="1.0" encoding="iso-8859-1" ?>
231         <root>
232           <foo arg="xyz"/>
233         </root>
234         `) ;
235
236         ...
237
238         my $null_xml = XML::Smart->new() ;
239
240         ...
241
242         my $xml_from_html = XML::Smart->new($html_data , 'html' ,
243         lowtag => 1 ,
244         lowarg => 1 ,
245         on_char => sub {
246                      my ( $tag , $pointer , $pointer_back , $cont) = @_ ;
247                      $pointer->{extra_arg} = 123 ; ## add an extrar argument.
248                      $pointer_back->{$tag}{extra_arg} = 123 ; ## Same, but using the previous pointer.
249                      $$cont .= "\n" ; ## append data to the content.
250                    }
251         ) ;
252
253   apply_dtd (DTD , OPTIONS)
254       Apply the DTD to the XML tree.
255
256       DTD can be a source, file, GLOB or URL.
257
258       This method is usefull if you need to have the XML generated by data()
259       formated in a specific DTD, so, elements will be nodes automatically,
260       attributes will be checked, required elements and attributes will be
261       created, the element order will be set, etc...
262
263       OPTIONS:
264
265       no_delete BOOL
266                 If TRUE tells that not defined elements and attributes in the
267                 DTD won't be deleted from the XML tree.
268
269       Example of use:
270
271         $xml->apply_dtd(q`
272         <!DOCTYPE cds [
273         <!ELEMENT cds (album+)>
274         <!ATTLIST cds
275                   creator  CDATA
276                   date     CDATA #REQUIRED
277                   type     (a|b|c) #REQUIRED "a"
278         >
279         <!ELEMENT album (#PCDATA)>
280         ]>
281         ` ,
282         no_delete => 1 ,
283         );
284
285   args()
286       Return the arguments names (not nodes).
287
288   args_values()
289       Return the arguments values (not nodes).
290
291   back()
292       Get back one level the pointer in the tree.
293
294       ** Se base().
295
296   base()
297       Get back to the base of the tree.
298
299       Each query to the XML::Smart object return an object pointing to a
300       different place in the tree (and share the same HASH tree). So, you can
301       get the main object again (an object that points to the base):
302
303         my $srv = $XML->{root}{host}{server} ;
304         my $addr = $srv->{adress} ;
305         my $XML2 = $srv->base() ;
306         $XML2->{root}{hosts}...
307
308   content()
309       Return the content of a node:
310
311         ## Data:
312         <foo>my content</foo>
313
314         ## Access:
315
316         my $content = $XML->{foo}->content ;
317         print "<<$content>>\n" ; ## show: <<my content>>
318
319         ## or just:
320         my $content = $XML->{foo} ;
321
322       Also can be used with multiple contents:
323
324       For this XML data:
325
326         <root>
327         content0
328         <tag1 arg="1"/>
329         content1
330         </root>
331
332       Getting all the content:
333
334         my $all_content = $XML->{root}->content ;
335         print "[$all_content]\n" ;
336
337       Output:
338
339         [
340         content0
341
342         content1
343         ]
344
345       Getting in parts:
346
347         my @contents = $XML->{root}->content ;
348         print "[@contents[0]]\n" ;
349         print "[@contents[1]]\n" ;
350
351       Output
352
353         [
354         content0
355         ]
356         [
357         content1
358         ]
359
360       Setting multiple contents:
361
362         $XML->{root}->content(0,"aaaaa") ;
363         $XML->{root}->content(1,"bbbbb") ;
364
365       Output now will be:
366
367         [aaaaa]
368         [bbbbb]
369
370       And now the XML data generated will be:
371
372         <root>aaaaa<tag1 arg="1"/>bbbbb</root>
373
374   copy()
375       Return a copy of the XML::Smart object (pointing to the base).
376
377       ** This is good when you want to keep 2 versions of the same XML tree
378       in the memory, since one object can't change the tree of the other!
379
380       WARNING: set_node(), set_cdata() and set_binary() changes are not
381       persistant over copy - Once you create a second copy these states are
382       lost.
383
384       b<warning:> do not copy after apply_dtd() unless you have checked for
385       dtd errors.
386
387   cut_root()
388       Cut the root key:
389
390         my $srv = $XML->{rootx}{host}{server} ;
391
392         ## Or if you don't know the root name:
393         $XML = $XML->cut_root() ;
394         my $srv = $XML->{host}{server} ;
395
396       ** Note that this will cut the root of the pointer in the tree.  So, if
397       you are in some place that have more than one key (multiple roots), the
398       same object will be retuned without cut anything.
399
400   data (OPTIONS)
401       Return the data of the XML object (rebuilding it).
402
403       Options:
404
405       nodtd      Do not add in the XML content the DTD applied by the method
406                  apply_dtd().
407
408       noident    If set to true the data isn't idented.
409
410       nospace    If set to true the data isn't idented and doesn't have space
411                  between the tags (unless the CONTENT have).
412
413       lowtag     Make the tags lower case.
414
415       lowarg     Make the arguments lower case.
416
417       upertag    Make the tags uper case.
418
419       uperarg    Make the arguments uper case.
420
421       length     If set true, add the attribute 'length' with the size of the
422                  data to the xml header (<?xml ...?>).  This is useful when
423                  you send the data through a socket, since the socket can
424                  know the total amount of data to read.
425
426       noheader   Do not add  the <?xml ...?> header.
427
428       nometagen  Do not add the meta generator tag: <?meta
429                  generator="XML::Smart" ?>
430
431       meta       Set the meta tags of the XML document.
432
433       decode     As of VERSION 1.73 there are three different base64
434                  encodings that are used. They are picked based on which of
435                  them support the data provided. If you want to retrieve data
436                  using the 'data' function the resultant xml will have
437                  dt:dt="binary.based" contained within it. To retrieve the
438                  decoded data use: $XML->data( decode => 1 )
439
440                  Examples:
441
442                      my $meta = {
443                      build_from => "wxWindows 2.4.0" ,
444                      file => "wx26.htm" ,
445                      } ;
446
447                      print $XML->data( meta => $meta ) ;
448
449                      __DATA__
450                      <?meta build_from="wxWindows 2.4.0" file="wx283.htm" ?>
451
452                  Multiple meta:
453
454                      my $meta = [
455                      {build_from => "wxWindows 2.4.0" , file => "wx26.htm" } ,
456                      {script => "genxml.pl" , ver => "1.0" } ,
457                      ] ;
458
459                      __DATA__
460                      <?meta build_from="wxWindows 2.4.0" file="wx26.htm" ?>
461                      <?meta script="genxml.pl" ver="1.0" ?>
462
463                  Or set directly the meta tag:
464
465                      my $meta = '<?meta foo="bar" ?>' ;
466
467                      ## For multiple:
468                      my $meta = ['<?meta foo="bar" ?>' , '<?meta x="1" ?>'] ;
469
470                      print $XML->data( meta => $meta ) ;
471
472       tree       Set the HASH tree to parse. If not set will use the tree of
473                  the XML::Smart object (tree()). ;
474
475       wild       Accept wild tags and arguments.
476
477                  ** This wont fix wrong keys and tags.
478
479       sortall    Sort all the tags alphabetically. If not set will keep the
480                  order of the document loaded, or the order of tag creation.
481                  Default: off
482
483   data_pointer (OPTIONS)
484       Make the tree from current point in the XML tree (not from the base as
485       data()).
486
487       Accept the same OPTIONS of the method data().
488
489   dump_tree()
490       Dump the tree of the object using Data::Dumper.
491
492   dump_tree_pointer()
493       Dump the tree of the object, from the pointer, using Data::Dumper.
494
495   dump_pointer()
496       ** Same as dump_tree_pointer().
497
498   i()
499       Return the index of the value.
500
501       ** If the value is from an hash key (not an ARRAY ref) undef is
502       returned.
503
504   is_node()
505       Return if a key is a node.
506
507   key()
508       Return the key of the value.
509
510       If wantarray return the index too: return(KEY , I) ;
511
512   nodes()
513       Return the nodes (objects) in the pointer (keys that aren't arguments).
514
515   nodes_keys()
516       Return the nodes names (not the object) in the pointer (keys that
517       aren't arguments).
518
519   null()
520       Return true if the XML object has a null tree or if the pointer is in
521       some place that doesn't exist.
522
523   order()
524       Return the order of the keys. See set_order().
525
526   path()
527       Return the path of the pointer.
528
529       Example:
530
531         /hosts/server[1]/address[0]
532
533       Note that the index is 0 based and 'address' can be an attribute or a
534       node, what is not compatible with XPath.
535
536       ** See path_as_xpath().
537
538   path_as_xpath()
539       Return the path of the pointer in the XPath format.
540
541   pointer
542       Return the HASH tree from the pointer.
543
544   pointer_ok
545       Return a copy of the tree of the object, from the pointer, but without
546       internal keys added by XML::Smart.
547
548   root
549       Return the ROOT name of the XML tree (main key).
550
551       ** See also key() for sub nodes.
552
553   save (FILEPATH , OPTIONS)
554       Save the XML data inside a file.
555
556       Accept the same OPTIONS of the method data().
557
558   set_auto
559       Define the key to be handled automatically. Soo, data() will define
560       automatically if it's a node, content or attribute.
561
562       ** This method is useful to remove set_node(), set_cdata() and
563       set_binary() changes.
564
565   set_auto_node
566       Define the key as a node, and data() will define automatically if it's
567       CDATA or BINARY.
568
569       ** This method is useful to remove set_cdata() and set_binary()
570       changes.
571
572   set_binary(BOOL)
573       Define the node as a BINARY content when TRUE, or force to not handle
574       it as a BINARY on FALSE.
575
576       Example of node handled as BINARY:
577
578         <root><foo dt:dt="binary.base64">PGgxPnRlc3QgAzwvaDE+</foo></root>
579
580       Original content of foo (the base64 data):
581
582         <h1>test \x03</h1>
583
584   set_cdata(BOOL)
585       Define the node as CDATA when TRUE, or force to not handle it as CDATA
586       on FALSE.
587
588       Example of CDATA node:
589
590         <root><foo><![CDATA[bla bla bla <tag> bla bla]]></foo></root>
591
592   set_node(BOOL)
593       Set/unset the current key as a node (tag).
594
595       ** If BOOL is not defined will use TRUE.
596
597       WARNING: You cannot set_node, copy the object and then set_node( 0 ) [
598       Unset node ]
599
600   set_order(KEYS)
601       Set the order of the keys (nodes and attributes) in this point.
602
603   set_tag
604       Same as set_node.
605
606   tree()
607       Return the HASH tree of the XML data.
608
609       ** Note that the real HASH tree is returned here. All the other ways
610       return an object that works like a HASH/ARRAY through tie.
611
612   tree_pointer()
613       Same as pointer().
614
615   tree_ok()
616       Return a copy of the tree of the object, but without internal keys
617       added by XML::Smart, like /order and /nodes.
618
619   tree_pointer_ok()
620       Return a copy of the tree of the object, from the pointer, but without
621       internal keys added by XML::Smart.
622
623   xpath() || XPath()
624       Return a XML::XPath object, based in the XML root in the tree.
625
626         ## look from the root:
627         my $data = $XML->XPath->findnodes_as_string('/') ;
628
629       ** Need XML::XPath installed, but only load when is needed.
630
631   xpath_pointer() || XPath_pointer()
632       Return a XML::XPath object, based in the XML::Smart pointer in the
633       tree.
634
635         ## look from this point, soo XPath '/' actually starts at /server/:
636
637         my $srvs = $XML->{server} ;
638         my $data = $srvs->XPath_pointer->findnodes_as_string('/') ;
639
640       ** Need XML::XPath installed, but only load when is needed.
641
642   ANNIHILATE
643       XML::Smart uses XML::XPath that, for perfomance reasons, leaks memory.
644       The ensure that this memory is freed you can explicitly call ANNIHILATE
645       before the XML::Smart object goes out of scope.
646

ACCESS

648       To access the data you use the object in a way similar to HASH and
649       ARRAY:
650
651         my $XML = XML::Smart->new('file.xml') ;
652
653         my $server = $XML->{server} ;
654
655       But when you get a key {server}, you are actually accessing the data
656       through tie(), not directly to the HASH tree inside the object, (This
657       will fix wrong accesses):
658
659         ## {server} is a normal key, not an ARRAY ref:
660
661         my $server = $XML->{server}[0] ; ## return $XML->{server}
662         my $server = $XML->{server}[1] ; ## return UNDEF
663
664         ## {server} has an ARRAY with 2 items:
665
666         my $server = $XML->{server} ;    ## return $XML->{server}[0]
667         my $server = $XML->{server}[0] ; ## return $XML->{server}[0]
668         my $server = $XML->{server}[1] ; ## return $XML->{server}[1]
669
670       To get all the values of multiple elements/keys:
671
672         ## This work having only a string inside {address}, or with an ARRAY ref:
673         my @addrsses = @{$XML->{server}{address}} ;
674
675   Select search
676       When you don't know the position of the nodes, you can select it by
677       some attribute value:
678
679         my $server = $XML->{server}('type','eq','suse') ; ## return $XML->{server}[1]
680
681       Syntax for the select search:
682
683         (NAME, CONDITION , VALUE)
684
685       NAME      The attribute name in the node (tag).
686
687       CONDITION Can be
688
689                   eq  ne  ==  !=  <=  >=  <  >
690
691                 For REGEX:
692
693                   =~  !~
694
695                   ## Case insensitive:
696                   =~i !~i
697
698       VALUE     The value.
699
700                 For REGEX use like this:
701
702                   $XML->{server}('type','=~','^s\w+$') ;
703
704   Select attributes in multiple nodes:
705       You can get the list of values of an attribute looking in all multiple
706       nodes:
707
708         ## Get all the server types:
709         my @types = $XML->{server}('[@]','type') ;
710
711       Also as:
712
713         my @types = $XML->{server}{type}('<@') ;
714
715       Without the resource:
716
717         my @list ;
718         my @servers = @{$XML->{server}} ;
719
720         foreach my $servers_i ( @servers ) {
721           push(@list , $servers_i->{type} ) ;
722         }
723
724   Return format
725       You can change the returned format:
726
727       Syntax:
728
729         (TYPE)
730
731       Where TYPE can be:
732
733         $  ## the content.
734         @  ## an array (list of multiple values).
735         %  ## a hash.
736         .  ## The exact point in the tree, not an object.
737
738         $@  ## an array, but with the content, not an objects.
739         $%  ## a hash, but the values are the content, not an object.
740
741         ## The use of $@ and $% is good if you don't want to keep the object
742         ## reference (and save memory).
743
744         @keys  ## The keys of the node. note that if you have a key with
745                ## multiple nodes, it will be replicated (this is the
746                ## difference of "keys %{$this->{node}}" ).
747
748         <@ ## Return the attribute in the previous node, but looking for
749            ## multiple nodes. Example:
750
751         my @names = $this->{method}{wxFrame}{arg}{name}('<@') ;
752         #### @names = (parent , id , title) ;
753
754         <xml> ## Return a XML data from this point.
755
756         __DATA__
757         <method>
758           <wxFrame return="wxFrame">
759             <arg name="parent" type="wxWindow" />
760             <arg name="id" type="wxWindowID" />
761             <arg name="title" type="wxString" />
762           </wxFrame>
763         </method>
764
765       Example:
766
767         ## A servers content
768         my $name = $XML->{server}{name}('$') ;
769         ## ... or:
770         my $name = $XML->{server}{name}->content ;
771         ## ... or:
772         my $name = $XML->{server}{name} ;
773         $name = "$name" ;
774
775         ## All the servers
776         my @servers = $XML->{server}('@') ;
777         ## ... or:
778         my @servers = @{$XML->{server}} ;
779
780         ## It still has the object reference:
781         @servers[0]->{name} ;
782
783         ## Without the reference:
784         my @servers = $XML->{server}('$@') ;
785
786         ## A XML data, same as data_pointer():
787         my $xml_data = $XML->{server}('<xml>') ;
788
789   CONTENT
790       If a {key} has a content you can access it directly from the variable
791       or from the method:
792
793         my $server = $XML->{server} ;
794
795         print "Content: $server\n" ;
796         ## ...or...
797         print "Content: ". $server->content ."\n" ;
798
799       So, if you use the object as a string it works as a string, if you use
800       as an object it works as an object! ;-P
801
802       **See the method content() for more.
803

CREATING XML DATA

805       To create XML data is easy, you just use as a normal HASH, but you
806       don't need to care with multiple nodes, and ARRAY creation/convertion!
807
808         ## Create a null XML object:
809         my $XML = XML::Smart->new() ;
810
811         ## Add a server to the list:
812         $XML->{server} = {
813         os => 'Linux' ,
814         type => 'mandrake' ,
815         version => 8.9 ,
816         address => '192.168.3.201' ,
817         } ;
818
819         ## The data now:
820         <server address="192.168.3.201" os="Linux" type="mandrake" version="8.9"/>
821
822         ## Add a new address to the server. Have an ARRAY creation, convertion
823         ## of the previous key to ARRAY:
824         $XML->{server}{address}[1] = '192.168.3.202' ;
825
826         ## The data now:
827         <server os="Linux" type="mandrake" version="8.9">
828           <address>192.168.3.201</address>
829           <address>192.168.3.202</address>
830         </server>
831
832       After create your XML tree you just save it or get the data:
833
834         ## Get the data:
835         my $data = $XML->data ;
836
837         ## Or save it directly:
838         $XML->save('newfile.xml') ;
839
840         ## Or send to a socket:
841         print $socket $XML->data(length => 1) ;
842

BINARY DATA & CDATA

844       From version 1.2 XML::Smart can handle binary data and CDATA blocks
845       automatically.
846
847       When parsing, binary data will be detected as:
848
849         <code dt:dt="binary.base64">f1NPTUUgQklOQVJZIERBVEE=</code>
850
851       Since this is the oficial automatically format for binary data at
852       XML.com <http://www.xml.com/pub/a/98/07/binary/binary.html>.  The
853       content will be decoded from base64 and saved in the object tree.
854
855       CDATA will be parsed as any other content, since CDATA is only a block
856       that won't be parsed.
857
858       When creating XML data, like at $XML->data(), the binary format and
859       CDATA are detected using these rules:
860
861         BINARY:
862         - If your data has characters that can't be in XML.
863
864         * Characters accepted:
865
866           \s \w \d
867           !"#$%&'()*+,-./:;<=>?@[\]^`{|}~
868           0x82, 0x83, 0x84, 0x85, 0x86, 0x87, 0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8e, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96,
869           0x97, 0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9e, 0x9f, 0xa1, 0xa2, 0xa3, 0xa4, 0xa5, 0xa6, 0xa7, 0xa8, 0xa9, 0xaa,
870           0xab, 0xac, 0xad, 0xae, 0xaf, 0xb0, 0xb1, 0xb2, 0xb3, 0xb4, 0xb5, 0xb6, 0xb7, 0xb8, 0xb9, 0xba, 0xbb, 0xbc,
871           0xbd, 0xbe, 0xbf, 0xc0, 0xc1, 0xc2, 0xc3, 0xc4, 0xc5, 0xc6, 0xc7, 0xc8, 0xc9, 0xca, 0xcb, 0xcc, 0xcd, 0xce,
872           0xcf, 0xd0, 0xd1, 0xd2, 0xd3, 0xd4, 0xd5, 0xd6, 0xd7, 0xd8, 0xd9, 0xda, 0xdb, 0xdc, 0xdd, 0xde, 0xdf, 0xe0,
873           0xe1, 0xe2, 0xe3, 0xe4, 0xe5, 0xe6, 0xe7, 0xe8, 0xe9, 0xea, 0xeb, 0xec, 0xed, 0xee, 0xef, 0xf0, 0xf1, 0xf2,
874           0xf3, 0xf4, 0xf5, 0xf6, 0xf7, 0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0xfd, 0xfe, 0xff, 0x20
875
876         TODO: 0x80, 0x81, 0x8d, 0x8f, 0x90, 0xa0
877
878         CDATA:
879         - If have tags: <...>
880
881         CONTENT: (<tag>content</tag>)
882         - If have \r\n\t, or ' and " at the same time.
883
884       So, this will be a CDATA content:
885
886         <code><![CDATA[
887           line1
888           <tag_not_parsed>
889           line2
890         ]]></code>
891
892       If binary content is detected, it will be converted to base64 and a
893       dt:dt attribute added in the tag to tell the format.
894
895         <code dt:dt="binary.base64">f1NPTUUgQklOQVJZIERBVEE=</code>
896
897       NOTE: As of VERSION 1.73 there are three different base64 encodings
898       that are used. They are picked based on which of them support the data
899       provided. If you want to retrieve data using the 'data' function the
900       resultant xml will have dt:dt="binary.based" contained within it. To
901       retrieve the decoded data use: $XML->data( decode => 1 )
902

UNICODE and ASCII-extended (ISO-8859-1)

904       XML::Smart support only thse 2 encode types, Unicode (UTF-8) and ASCII-
905       extended (ISO-8859-1), and must be enough. (Note that UTF-8 is only
906       supported on Perl-5.8+).
907
908       When creating XML data, if any UTF-8 character is detected the encoding
909       attribute in the <?xml ...?> header will be set to UTF-8:
910
911         <?xml version="1.0" encoding="utf-8" ?>
912         <data>0x82, 0x83</data>
913
914       If not, the iso-8859-1 is used:
915
916         <?xml version="1.0" encoding="iso-8859-1" ?>
917         <data>0x82</data>
918
919       When loading XML data with UTF-8, Perl (5.8+) should make all the work
920       internally.
921

PARSING HTML as XML, or BAD XML formats

923       You can use the special parser XML::Smart::HTMLParser to "use" HTML as
924       XML or not well-formed XML data.
925
926       The differences between an normal XML parser and XML::Smart::HTMLParser
927       are:
928
929         - Accept values without quotes:
930           <foo bar=x>
931
932         - Accept any data in the values, including <> and &:
933           <root><echo sample="echo \"Hello!\">out.txt"></root>
934
935         - Accpet URI values without quotes:
936           <link url=http://www.foo.com/dir/file?query?q=v&x=y target=#_blank>
937
938         - Don't need to close the tags adding the '/' before '>':
939           <root><foo bar="1"></root>
940
941           ** Note that the parse will try hard to detect the nodes, and where
942              auto-close or not.
943
944         - Don't need to have only one root:
945           <foo>data</foo><bar>data</bar>
946
947       So, XML::Smart::HTMLParser is a willd way to load markuped data (like
948       HTML), or if you don't want to care with quotes, end tags, etc... when
949       writing by hand your XML data.  So, you can write by hand a bad XML
950       file, load it with XML::Smart::HTMLParser, and rewrite well saving it
951       again! ;-P
952
953       ** Note that <SCRIPT> tags will only parse right if the content is
954       inside comments <!--...-->, since they can have tags:
955
956         <SCRIPT LANGUAGE="JavaScript"><!--
957         document.writeln("some <tag> in the string");
958         --></SCRIPT>
959

ENTITIES

961       Entities (ENTITY) are handled by the parser. So, if you use XML::Parser
962       it will do all the job fine.  But If you use XML::Smart::Parser or
963       XML::Smart::HMLParser, only the basic entities (defaults) will be
964       parsed:
965
966         &lt;   => The less than sign (<).
967         &gt;   => The greater than sign (>).
968         &amp;  => The ampersand (&).
969         &apos; => The single quote or apostrophe (').
970         &quot; => The double quote (").
971
972         &#ddd;  => An ASCII character or an Unicode character (>255). Where ddd is a decimal.
973         &#xHHH; => An Unicode character. Where HHH is in hexadecimal.
974
975       When creating XML data, already existent Entities won't be changed, and
976       the characters '<', '&' and '>' will be converted to the appropriated
977       entity.
978
979       ** Note that if a content have a <tag>, the characters '<' and '>'
980       won't be converted to entities, and this content will be inside a CDATA
981       block.
982

WHY AND HOW IT WORKS

984       Every one that have tried to use Perl HASH and ARRAY to access XML
985       data, like in XML::Simple, have some problems to add new nodes, or to
986       access the node when the user doesn't know if it's inside an ARRAY, a
987       HASH or a HASH key. XML::Smart create around it a very dynamic way to
988       access the data, since at the same time any node/point in the tree can
989       be a HASH and an ARRAY. You also have other extra resources, like a
990       search for nodes by attribute:
991
992         my $server = $XML->{server}('type','eq','suse') ; ## This syntax is not wrong! ;-)
993
994         ## Instead of:
995         my $server = $XML->{server}[1] ;
996
997         __DATA__
998         <hosts>
999           <server os="linux" type="redhat" version="8.0">
1000           <server os="linux" type="suse" version="7.0">
1001         </hosts>
1002
1003       The idea for this module, came from the problem that exists to access a
1004       complex struture in XML.  You just need to know how is this structure,
1005       something that is generally made looking the XML file (what is wrong).
1006       But at the same time is hard to always check (by code) the struture,
1007       before access it.  XML is a good and easy format to declare your data,
1008       but to extrac it in a tree way, at least in my opinion, isn't easy. To
1009       fix that, came to my mind a way to access the data with some query
1010       language, like SQL.  The first idea was to access using something like:
1011
1012         XML.foo.bar.baz{arg1}
1013
1014         X = XML.foo.bar*
1015         X.baz{arg1}
1016
1017         XML.hosts.server[0]{argx}
1018
1019       And saw that this is very similar to Hashes and Arrays in Perl:
1020
1021         $XML->{foo}{bar}{baz}{arg1} ;
1022
1023         $X = $XML->{foo}{bar} ;
1024         $X->{baz}{arg1} ;
1025
1026         $XML->{hosts}{server}[0]{argx} ;
1027
1028       But the problem of Hash and Array, is not knowing when you have an
1029       Array reference or not.  For example, in XML::Simple:
1030
1031         ## This is very diffenrent
1032         $XML->{server}{address} ;
1033         ## ... of this:
1034         $XML->{server}{address}[0] ;
1035
1036       So, why don't make both ways work? Because you need to make something
1037       crazy!
1038
1039       To create XML::Smart, first I have created the module
1040       Object::MultiType.  With it you can have an object that works at the
1041       same time as a HASH, ARRAY, SCALAR, CODE & GLOB. So you can do things
1042       like this with the same object:
1043
1044         $obj = Object::MultiType->new() ;
1045
1046         $obj->{key} ;
1047         $obj->[0] ;
1048         $obj->method ;
1049
1050         @l = @{$obj} ;
1051         %h = %{$obj} ;
1052
1053         &$obj(args) ;
1054
1055         print $obj "send data\n" ;
1056
1057       Seems to be crazy, and can be more if you use tie() inside it, and this
1058       is what XML::Smart does.
1059
1060       For XML::Smart, the access in the Hash and Array way paste through
1061       tie(). In other words, you have a tied HASH and tied ARRAY inside it.
1062       This tied Hash and Array work together, soo you can access a Hash key
1063       as the index 0 of an Array, or access an index 0 as the Hash key:
1064
1065         %hash = (
1066         key => ['a','b','c']
1067         ) ;
1068
1069         $hash->{key}    ## return $hash{key}[0]
1070         $hash->{key}[0] ## return $hash{key}[0]
1071         $hash->{key}[1] ## return $hash{key}[1]
1072
1073         ## Inverse:
1074
1075         %hash = ( key => 'a' ) ;
1076
1077         $hash->{key}    ## return $hash{key}
1078         $hash->{key}[0] ## return $hash{key}
1079         $hash->{key}[1] ## return undef
1080
1081       The best thing of this new resource is to avoid wrong access to the
1082       data and warnings when you try to access a Hash having an Array (and
1083       the inverse). Thing that generally make the script die().
1084
1085       Once having an easy access to the data, you can use the same resource
1086       to create data!  For example:
1087
1088         ## Previous data:
1089         <hosts>
1090           <server address="192.168.2.100" os="linux" type="conectiva" version="9.0"/>
1091         </hosts>
1092
1093         ## Now you have {address} as a normal key with a string inside:
1094         $XML->{hosts}{server}{address}
1095
1096         ## And to add a new address, the key {address} need to be an ARRAY ref!
1097         ## So, XML::Smart make the convertion: ;-P
1098         $XML->{hosts}{server}{address}[1] = '192.168.2.101' ;
1099
1100         ## Adding to a list that you don't know the size:
1101         push(@{$XML->{hosts}{server}{address}} , '192.168.2.102') ;
1102
1103         ## The data now:
1104         <hosts>
1105           <server os="linux" type="conectiva" version="9.0"/>
1106             <address>192.168.2.100</address>
1107             <address>192.168.2.101</address>
1108             <address>192.168.2.102</address>
1109           </server>
1110         </hosts>
1111
1112       Than after changing your XML tree using the Hash and Array resources
1113       you just get the data remade (through the Hash tree inside the object):
1114
1115         my $xmldata = $XML->data ;
1116
1117       But note that XML::Smart always return an object! Even when you get a
1118       final key. So this actually returns another object, pointhing (inside
1119       it) to the key:
1120
1121         $addr = $XML->{hosts}{server}{address}[0] ;
1122
1123         ## Since $addr is an object you can TRY to access more data:
1124         $addr->{foo}{bar} ; ## This doens't make warnings! just return UNDEF.
1125
1126         ## But you can use it like a normal SCALAR too:
1127
1128         print "$addr\n" ;
1129
1130         $addr .= ':80' ; ## After this $addr isn't an object any more, just a SCALAR!
1131

TODO

1133         * Finish XPath implementation.
1134         * DTD - Handle <!DOCTYPE> gracefully.
1135         * Implement a better way to declare meta tags.
1136         * Add 0x80, 0x81, 0x8d, 0x8f, 0x90, 0xa0 ( multi byte characters to the list of accepted binary characters )
1137         * Ensure object copy holds more in state including: ->data( wild => 1 )
1138

SEE ALSO

1140       XML::Parser, XML::Parser::Lite, XML::XPath, XML.
1141
1142       Object::MultiType - This is the module that make everything possible,
1143       and was created specially for XML::Smart. ;-P
1144
1145       ** See the test.pl script for examples of use.
1146
1147       XML.com <http://www.xml.com>
1148

AUTHOR

1150       Graciliano M. P. "<gm at virtuasites.com.br>"
1151
1152       I will appreciate any type of feedback (include your opinions and/or
1153       suggestions). ;-P
1154
1155       Enjoy and thanks for who are enjoying this tool and have sent e-mails!
1156       ;-P
1157

CURRENT MAINTAINER

1159       Harish Madabushi, "<harish.tmh at gmail.com>"
1160

BUGS

1162       Please report any bugs or feature requests to "bug-xml-smart at
1163       rt.cpan.org", or through the web interface at
1164       <http://rt.cpan.org/NoAuth/ReportBug.html?Queue=XML-Smart>.  Both the
1165       author and the maintainer will be notified, and then you'll
1166       automatically be notified of progress on your bug as changes are made.
1167

SUPPORT

1169       You can find documentation for this module with the perldoc command.
1170
1171           perldoc XML::Smart
1172
1173       You can also look for information at:
1174
1175       •    RT: CPAN's request tracker (report bugs here)
1176
1177            <http://rt.cpan.org/NoAuth/Bugs.html?Dist=XML-Smart>
1178
1179       •    AnnoCPAN: Annotated CPAN documentation
1180
1181            <http://annocpan.org/dist/XML-Smart>
1182
1183       •    CPAN Ratings
1184
1185            <http://cpanratings.perl.org/d/XML-Smart>
1186
1187       •    Search CPAN
1188
1189            <http://search.cpan.org/dist/XML-Smart/>
1190
1191       •    GitHub CPAN
1192
1193            <https://github.com/harishmadabushi/XML-Smart>
1194

THANKS

1196       Thanks to Rusty Allen for the extensive tests of CDATA and BINARY
1197       handling of XML::Smart.
1198
1199       Thanks to Ted Haining to point a Perl-5.8.0 bug for tied keys of a
1200       HASH.
1201
1202       Thanks to everybody that have sent ideas, patches or pointed bugs.
1203
1205       Copyright 2003 Graciliano M. P.
1206
1207       This program is free software; you can redistribute it and/or modify it
1208       under the same terms as Perl itself.
1209
1210
1211
1212perl v5.36.0                      2023-01-20                     XML::Smart(3)
Impressum