1XML::Smart(3)         User Contributed Perl Documentation        XML::Smart(3)
2
3
4

NAME

6       XML::Smart - A smart, easy and powerful way to access/create XML
7       files/data.
8

DESCRIPTION

10       This module has an easy way to access/create XML data. It's based on
11       the HASH tree that is made of the XML data, and enable a dynamic access
12       to it with the Perl syntax for Hash and Array, without needing to care
13       if you have a Hash or an Array in the tree. In other words, each point
14       in the tree work as a Hash and an Array at the same time!
15
16       You also have extra resources, like a search for nodes by attribute,
17       selection of an attribute value in each multiple node,  change the
18       returned format, etc...
19
20       The module alson handle automatically binary data (encoding/decoding
21       to/from base64), CDATA (like contents with <tags>) and Unicode. It can
22       be used to create XML files, load XML from the Web (just pasting an URL
23       as a file path) and it has an easy way to send XML data through socket,
24       just adding the length of the data in the <?xml?> header.
25
26       You can use XML::Smart with XML::Parser, or with the 2 standart parsers
27       of XML::Smart:
28
29       XML::Smart::Parser
30       XML::Smart::HTMLParser.
31
32       XML::Smart::HTMLParser can be used to load/parse wild/bad XML data, or
33       HTML tags.
34

Tutorial and F.A.Q.

36       You can find some extra documents about XML::Smart at:
37
38       XML::Smart::Tutorial - Tutorial and examples for XML::Smart.
39       XML::Smart::FAQ - Frequently Asked Questions about XML::Smart.
40

USAGE

42         ## Create the object and load the file:
43         my $XML = XML::Smart->new('file.xml') ;
44
45         ## Force the use of the parser 'XML::Smart::Parser'.
46         my $XML = XML::Smart->new('file.xml' , 'XML::Smart::Parser') ;
47
48         ## Get from the web:
49         my $XML = XML::Smart->new('http://www.perlmonks.org/index.pl?node_id=16046') ;
50
51         ## Cut the root:
52         $XML = $XML->cut_root ;
53
54         ## Or change the root:
55         $XML = $XML->{hosts} ;
56
57         ## Get the address [0] of server [0]:
58         my $srv0_addr0 = $XML->{server}[0]{address}[0] ;
59         ## ...or...
60         my $srv0_addr0 = $XML->{server}{address} ;
61
62         ## Get the server where the attibute 'type' eq 'suse':
63         my $server = $XML->{server}('type','eq','suse') ;
64
65         ## Get the address again:
66         my $addr1 = $server->{address}[1] ;
67         ## ...or...
68         my $addr1 = $XML->{server}('type','eq','suse'){address}[1] ;
69
70         ## Get all the addresses of a server:
71         my @addrs = @{$XML->{server}{address}} ;
72         ## ...or...
73         my @addrs = $XML->{server}{address}('@') ;
74
75         ## Get a list of types of all the servers:
76         my @types = $XML->{server}('[@]','type') ;
77
78         ## Add a new server node:
79         my $newsrv = {
80         os      => 'Linux' ,
81         type    => 'Mandrake' ,
82         version => 8.9 ,
83         address => [qw(192.168.3.201 192.168.3.202)]
84         } ;
85
86         push(@{$XML->{server}} , $newsrv) ;
87
88         ## Get/rebuild the XML data:
89         my $xmldata = $XML->data ;
90
91         ## Save in some file:
92         $XML->save('newfile.xml') ;
93
94         ## Send through a socket:
95         print $socket $XML->data(length => 1) ; ## show the 'length' in the XML header to the
96                                                 ## socket know the amount of data to read.
97
98         __DATA__
99         <?xml version="1.0" encoding="iso-8859-1"?>
100         <hosts>
101           <server os="linux" type="redhat" version="8.0">
102             <address>192.168.0.1</address>
103             <address>192.168.0.2</address>
104           </server>
105           <server os="linux" type="suse" version="7.0">
106             <address>192.168.1.10</address>
107             <address>192.168.1.20</address>
108           </server>
109           <server address="192.168.2.100" os="linux" type="conectiva" version="9.0"/>
110         </hosts>
111

METHODS

113   new (FILE|DATA|URL , PARSER , OPTIONS)
114       Create a XML object.
115
116       Arguments:
117
118       FILE|DATA|URL
119                 The first argument can be:
120
121                   - XML data as string.
122                   - File path.
123                   - File Handle (GLOB).
124                   - URL (Need LWP::UserAgent).
125
126                 If not paste, a null XML tree is started, where you should
127                 create your own XML data, than build/save/send it.
128
129       PARSER (optional)
130                 Set the XML parser to use. Options:
131
132                   XML::Parser
133                   XML::Smart::Parser
134                   XML::Smart::HTMLParser
135
136                 XML::Smart::Parser can only handle basic XML data (not
137                 supported PCDATA, and any header like: ENTITY, NOTATION,
138                 etc...), but is a good choice when you don't want to install
139                 big modules to parse XML, since it comes with the main
140                 module. But it still can handle CDATA and binary data.
141
142                 ** See "PARSING HTML as XML" for XML::Smart::HTMLParser.
143
144                 Aliases for the options:
145
146                   SMART|REGEXP   => XML::Smart::Parser
147                   HTML           => XML::Smart::HTMLParser
148
149                 Default:
150
151                 If not set it will look for XML::Parser and load it.  If
152                 XML::Parser can't be loaded it will use XML::Smart::Parser,
153                 that actually is a clone of XML::Parser::Lite with some
154                 fixes.
155
156       OPTIONS   You can force the uper case and lower case for tags (nodes)
157                 and arguments (attributes), and other extra things.
158
159                 lowtag    Make the tags lower case.
160
161                 lowarg    Make the arguments lower case.
162
163                 upertag   Make the tags uper case.
164
165                 uperarg   Make the arguments uper case.
166
167                 arg_single
168                           Set the value of arguments to 1 when they have a
169                           undef value.
170
171                           ** This option will work only when the XML is
172                           parsed by XML::Smart::HTMLParser, since it accept
173                           arguments without values:
174
175                             my $xml = new XML::Smart(
176                             '<root><foo arg1="" flag></root>' ,
177                             'XML::Smart::HTMLParser' ,
178                             arg_single => 1 ,
179                             ) ;
180
181                           In this example the option "arg_single" was used,
182                           what will define flag to 1, but arg1 will still
183                           have a null string value ("").
184
185                           Here's the tree of the example above:
186
187                             'root' => {
188                                         'foo' => {
189                                                    'flag' => 1,
190                                                    'arg1' => ''
191                                                  },
192                                       },
193
194                 use_spaces
195                           Accept contents that have only spaces.
196
197                 on_start (CODE) *optional
198                           Code/sub to call on start a tag.
199
200                           ** This will be called after XML::Smart parse the
201                           tag, should be used only if you want to change the
202                           tree.
203
204                 on_char (CODE) *optional
205                           Code/sub to call on content.
206
207                           ** This will be called after XML::Smart parse the
208                           tag, should be used only if you want to change the
209                           tree.
210
211                 on_end (CODE) *optional
212                           Code/sub to call on end a tag.
213
214                           ** This will be called after XML::Smart parse the
215                           tag, should be used only if you want to change the
216                           tree.
217
218                 ** This options are applied when the XML data is loaded. For
219                 XML generation see data() OPTIONS.
220
221       Examples of use:
222
223         my $xml_from_url = XML::Smart->new("http://www.perlmonks.org/index.pl?node_id=16046") ;
224
225         ...
226
227         my $xml_from_str = XML::Smart->new(q`<?xml version="1.0" encoding="iso-8859-1" ?>
228         <root>
229           <foo arg="xyz"/>
230         </root>
231         `) ;
232
233         ...
234
235         my $null_xml = XML::Smart->new() ;
236
237         ...
238
239         my $xml_from_html = XML::Smart->new($html_data , 'html' ,
240         lowtag => 1 ,
241         lowarg => 1 ,
242         on_char => sub {
243                      my ( $tag , $pointer , $pointer_back , $cont) = @_ ;
244                      $pointer->{extra_arg} = 123 ; ## add an extrar argument.
245                      $pointer_back->{$tag}{extra_arg} = 123 ; ## Same, but using the previous pointer.
246                      $$cont .= "\n" ; ## append data to the content.
247                    }
248         ) ;
249
250   apply_dtd (DTD , OPTIONS)
251       Apply the DTD to the XML tree.
252
253       DTD can be a source, file, GLOB or URL.
254
255       This method is usefull if you need to have the XML generated by data()
256       formated in a specific DTD, so, elements will be nodes automatically,
257       attributes will be checked, required elements and attributes will be
258       created, the element order will be set, etc...
259
260       OPTIONS:
261
262       no_delete BOOL
263                 If TRUE tells that not defined elements and attributes in the
264                 DTD won't be deleted from the XML tree.
265
266       Example of use:
267
268         $xml->apply_dtd(q`
269         <!DOCTYPE cds [
270         <!ELEMENT cds (album+)>
271         <!ATTLIST cds
272                   creator  CDATA
273                   date     CDATA #REQUIRED
274                   type     (a|b|c) #REQUIRED "a"
275         >
276         <!ELEMENT album (#PCDATA)>
277         ]>
278         ` ,
279         no_delete => 1 ,
280         );
281
282   args()
283       Return the arguments names (not nodes).
284
285   args_values()
286       Return the arguments values (not nodes).
287
288   back()
289       Get back one level the pointer in the tree.
290
291       ** Se base().
292
293   base()
294       Get back to the base of the tree.
295
296       Each query to the XML::Smart object return an object pointing to a
297       different place in the tree (and share the same HASH tree). So, you can
298       get the main object again (an object that points to the base):
299
300         my $srv = $XML->{root}{host}{server} ;
301         my $addr = $srv->{adress} ;
302         my $XML2 = $srv->base() ;
303         $XML2->{root}{hosts}...
304
305   content()
306       Return the content of a node:
307
308         ## Data:
309         <foo>my content</foo>
310
311         ## Access:
312
313         my $content = $XML->{foo}->content ;
314         print "<<$content>>\n" ; ## show: <<my content>>
315
316         ## or just:
317         my $content = $XML->{foo} ;
318
319       Also can be used with multiple contents:
320
321       For this XML data:
322
323         <root>
324         content0
325         <tag1 arg="1"/>
326         content1
327         </root>
328
329       Getting all the content:
330
331         my $all_content = $XML->{root}->content ;
332         print "[$all_content]\n" ;
333
334       Output:
335
336         [
337         content0
338
339         content1
340         ]
341
342       Getting in parts:
343
344         my @contents = $XML->{root}->content ;
345         print "[@contents[0]]\n" ;
346         print "[@contents[1]]\n" ;
347
348       Output
349
350         [
351         content0
352         ]
353         [
354         content1
355         ]
356
357       Setting multiple contents:
358
359         $XML->{root}->content(0,"aaaaa") ;
360         $XML->{root}->content(1,"bbbbb") ;
361
362       Output now will be:
363
364         [aaaaa]
365         [bbbbb]
366
367       And now the XML data generated will be:
368
369         <root>aaaaa<tag1 arg="1"/>bbbbb</root>
370
371   copy()
372       Return a copy of the XML::Smart object (pointing to the base).
373
374       ** This is good when you want to keep 2 versions of the same XML tree
375       in the memory, since one object can't change the tree of the other!
376
377   cut_root()
378       Cut the root key:
379
380         my $srv = $XML->{rootx}{host}{server} ;
381
382         ## Or if you don't know the root name:
383         $XML = $XML->cut_root() ;
384         my $srv = $XML->{host}{server} ;
385
386       ** Note that this will cut the root of the pointer in the tree.  So, if
387       you are in some place that have more than one key (multiple roots), the
388       same object will be retuned without cut anything.
389
390   data (OPTIONS)
391       Return the data of the XML object (rebuilding it).
392
393       Options:
394
395       nodtd     Do not add in the XML content the DTD applied by the method
396                 apply_dtd().
397
398       noident   If set to true the data isn't idented.
399
400       nospace   If set to true the data isn't idented and doesn't have space
401                 between the tags (unless the CONTENT have).
402
403       lowtag    Make the tags lower case.
404
405       lowarg    Make the arguments lower case.
406
407       upertag   Make the tags uper case.
408
409       uperarg   Make the arguments uper case.
410
411       length    If set true, add the attribute 'length' with the size of the
412                 data to the xml header (<?xml ...?>).  This is useful when
413                 you send the data through a socket, since the socket can know
414                 the total amount of data to read.
415
416       noheader  Do not add  the <?xml ...?> header.
417
418       nometagen Do not add the meta generator tag: <?meta
419                 generator="XML::Smart" ?>
420
421       meta      Set the meta tags of the XML document.
422
423                 Examples:
424
425                     my $meta = {
426                     build_from => "wxWindows 2.4.0" ,
427                     file => "wx26.htm" ,
428                     } ;
429
430                     print $XML->data( meta => $meta ) ;
431
432                     __DATA__
433                     <?meta build_from="wxWindows 2.4.0" file="wx283.htm" ?>
434
435                 Multiple meta:
436
437                     my $meta = [
438                     {build_from => "wxWindows 2.4.0" , file => "wx26.htm" } ,
439                     {script => "genxml.pl" , ver => "1.0" } ,
440                     ] ;
441
442                     __DATA__
443                     <?meta build_from="wxWindows 2.4.0" file="wx26.htm" ?>
444                     <?meta script="genxml.pl" ver="1.0" ?>
445
446                 Or set directly the meta tag:
447
448                     my $meta = '<?meta foo="bar" ?>' ;
449
450                     ## For multiple:
451                     my $meta = ['<?meta foo="bar" ?>' , '<?meta x="1" ?>'] ;
452
453                     print $XML->data( meta => $meta ) ;
454
455       tree      Set the HASH tree to parse. If not set will use the tree of
456                 the XML::Smart object (tree()). ;
457
458       wild      Accept wild tags and arguments.
459
460                 ** This wont fix wrong keys and tags.
461
462       sortall   Sort all the tags alphabetically. If not set will keep the
463                 order of the document loaded, or the order of tag creation.
464                 Default: off
465
466   data_pointer (OPTIONS)
467       Make the tree from current point in the XML tree (not from the base as
468       data()).
469
470       Accept the same OPTIONS of the method data().
471
472   dump_tree()
473       Dump the tree of the object using Data::Dumper.
474
475   dump_tree_pointer()
476       Dump the tree of the object, from the pointer, using Data::Dumper.
477
478   dump_pointer()
479       ** Same as dump_tree_pointer().
480
481   i()
482       Return the index of the value.
483
484       ** If the value is from an hash key (not an ARRAY ref) undef is
485       returned.
486
487   is_node()
488       Return if a key is a node.
489
490   key()
491       Return the key of the value.
492
493       If wantarray return the index too: return(KEY , I) ;
494
495   nodes()
496       Return the nodes (objects) in the pointer (keys that aren't arguments).
497
498   nodes_keys()
499       Return the nodes names (not the object) in the pointer (keys that
500       aren't arguments).
501
502   null()
503       Return true if the XML object has a null tree or if the pointer is in
504       some place that doesn't exist.
505
506   order()
507       Return the order of the keys. See set_order().
508
509   path()
510       Return the path of the pointer.
511
512       Example:
513
514         /hosts/server[1]/address[0]
515
516       Note that the index is 0 based and 'address' can be an attribute or a
517       node, what is not compatible with XPath.
518
519       ** See path_as_xpath().
520
521   path_as_xpath()
522       Return the path of the pointer in the XPath format.
523
524   pointer
525       Return the HASH tree from the pointer.
526
527   pointer_ok
528       Return a copy of the tree of the object, from the pointer, but without
529       internal keys added by XML::Smart.
530
531   root
532       Return the ROOT name of the XML tree (main key).
533
534       ** See also key() for sub nodes.
535
536   save (FILEPATH , OPTIONS)
537       Save the XML data inside a file.
538
539       Accept the same OPTIONS of the method data().
540
541   set_auto
542       Define the key to be handled automatically. Soo, data() will define
543       automatically if it's a node, content or attribute.
544
545       ** This method is useful to remove set_node(), set_cdata() and
546       set_binary() changes.
547
548   set_auto_node
549       Define the key as a node, and data() will define automatically if it's
550       CDATA or BINARY.
551
552       ** This method is useful to remove set_cdata() and set_binary()
553       changes.
554
555   set_binary(BOOL)
556       Define the node as a BINARY content when TRUE, or force to not handle
557       it as a BINARY on FALSE.
558
559       Example of node handled as BINARY:
560
561         <root><foo dt:dt="binary.base64">PGgxPnRlc3QgAzwvaDE+</foo></root>
562
563       Original content of foo (the base64 data):
564
565         <h1>test \x03</h1>
566
567   set_cdata(BOOL)
568       Define the node as CDATA when TRUE, or force to not handle it as CDATA
569       on FALSE.
570
571       Example of CDATA node:
572
573         <root><foo><![CDATA[bla bla bla <tag> bla bla]]></foo></root>
574
575   set_node(BOOL)
576       Set/unset the current key as a node (tag).
577
578       ** If BOOL is not defined will use TRUE.
579
580   set_order(KEYS)
581       Set the order of the keys (nodes and attributes) in this point.
582
583   set_tag
584       Same as set_node.
585
586   tree()
587       Return the HASH tree of the XML data.
588
589       ** Note that the real HASH tree is returned here. All the other ways
590       return an object that works like a HASH/ARRAY through tie.
591
592   tree_pointer()
593       Same as pointer().
594
595   tree_ok()
596       Return a copy of the tree of the object, but without internal keys
597       added by XML::Smart, like /order and /nodes.
598
599   tree_pointer_ok()
600       Return a copy of the tree of the object, from the pointer, but without
601       internal keys added by XML::Smart.
602
603   xpath() || XPath()
604       Return a XML::XPath object, based in the XML root in the tree.
605
606         ## look from the root:
607         my $data = $XML->XPath->findnodes_as_string('/') ;
608
609       ** Need XML::XPath installed, but only load when is needed.
610
611   xpath_pointer() || XPath_pointer()
612       Return a XML::XPath object, based in the XML::Smart pointer in the
613       tree.
614
615         ## look from this point, soo XPath '/' actually starts at /server/:
616
617         my $srvs = $XML->{server} ;
618         my $data = $srvs->XPath_pointer->findnodes_as_string('/') ;
619
620       ** Need XML::XPath installed, but only load when is needed.
621

ACCESS

623       To access the data you use the object in a way similar to HASH and
624       ARRAY:
625
626         my $XML = XML::Smart->new('file.xml') ;
627
628         my $server = $XML->{server} ;
629
630       But when you get a key {server}, you are actually accessing the data
631       through tie(), not directly to the HASH tree inside the object, (This
632       will fix wrong accesses):
633
634         ## {server} is a normal key, not an ARRAY ref:
635
636         my $server = $XML->{server}[0] ; ## return $XML->{server}
637         my $server = $XML->{server}[1] ; ## return UNDEF
638
639         ## {server} has an ARRAY with 2 items:
640
641         my $server = $XML->{server} ;    ## return $XML->{server}[0]
642         my $server = $XML->{server}[0] ; ## return $XML->{server}[0]
643         my $server = $XML->{server}[1] ; ## return $XML->{server}[1]
644
645       To get all the values of multiple elements/keys:
646
647         ## This work having only a string inside {address}, or with an ARRAY ref:
648         my @addrsses = @{$XML->{server}{address}} ;
649
650   Select search
651       When you don't know the position of the nodes, you can select it by
652       some attribute value:
653
654         my $server = $XML->{server}('type','eq','suse') ; ## return $XML->{server}[1]
655
656       Syntax for the select search:
657
658         (NAME, CONDITION , VALUE)
659
660       NAME      The attribute name in the node (tag).
661
662       CONDITION Can be
663
664                   eq  ne  ==  !=  <=  >=  <  >
665
666                 For REGEX:
667
668                   =~  !~
669
670                   ## Case insensitive:
671                   =~i !~i
672
673       VALUE     The value.
674
675                 For REGEX use like this:
676
677                   $XML->{server}('type','=~','^s\w+$') ;
678
679   Select attributes in multiple nodes:
680       You can get the list of values of an attribute looking in all multiple
681       nodes:
682
683         ## Get all the server types:
684         my @types = $XML->{server}('[@]','type') ;
685
686       Also as:
687
688         my @types = $XML->{server}{type}('<@') ;
689
690       Without the resource:
691
692         my @list ;
693         my @servers = @{$XML->{server}} ;
694
695         foreach my $servers_i ( @servers ) {
696           push(@list , $servers_i->{type} ) ;
697         }
698
699   Return format
700       You can change the returned format:
701
702       Syntax:
703
704         (TYPE)
705
706       Where TYPE can be:
707
708         $  ## the content.
709         @  ## an array (list of multiple values).
710         %  ## a hash.
711         .  ## The exact point in the tree, not an object.
712
713         $@  ## an array, but with the content, not an objects.
714         $%  ## a hash, but the values are the content, not an object.
715
716         ## The use of $@ and $% is good if you don't want to keep the object
717         ## reference (and save memory).
718
719         @keys  ## The keys of the node. note that if you have a key with
720                ## multiple nodes, it will be replicated (this is the
721                ## difference of "keys %{$this->{node}}" ).
722
723         <@ ## Return the attribute in the previous node, but looking for
724            ## multiple nodes. Example:
725
726         my @names = $this->{method}{wxFrame}{arg}{name}('<@') ;
727         #### @names = (parent , id , title) ;
728
729         <xml> ## Return a XML data from this point.
730
731         __DATA__
732         <method>
733           <wxFrame return="wxFrame">
734             <arg name="parent" type="wxWindow" />
735             <arg name="id" type="wxWindowID" />
736             <arg name="title" type="wxString" />
737           </wxFrame>
738         </method>
739
740       Example:
741
742         ## A servers content
743         my $name = $XML->{server}{name}('$') ;
744         ## ... or:
745         my $name = $XML->{server}{name}->content ;
746         ## ... or:
747         my $name = $XML->{server}{name} ;
748         $name = "$name" ;
749
750         ## All the servers
751         my @servers = $XML->{server}('@') ;
752         ## ... or:
753         my @servers = @{$XML->{server}} ;
754
755         ## It still has the object reference:
756         @servers[0]->{name} ;
757
758         ## Without the reference:
759         my @servers = $XML->{server}('$@') ;
760
761         ## A XML data, same as data_pointer():
762         my $xml_data = $XML->{server}('<xml>') ;
763
764   CONTENT
765       If a {key} has a content you can access it directly from the variable
766       or from the method:
767
768         my $server = $XML->{server} ;
769
770         print "Content: $server\n" ;
771         ## ...or...
772         print "Content: ". $server->content ."\n" ;
773
774       So, if you use the object as a string it works as a string, if you use
775       as an object it works as an object! ;-P
776
777       **See the method content() for more.
778

CREATING XML DATA

780       To create XML data is easy, you just use as a normal HASH, but you
781       don't need to care with multiple nodes, and ARRAY creation/convertion!
782
783         ## Create a null XML object:
784         my $XML = XML::Smart->new() ;
785
786         ## Add a server to the list:
787         $XML->{server} = {
788         os => 'Linux' ,
789         type => 'mandrake' ,
790         version => 8.9 ,
791         address => '192.168.3.201' ,
792         } ;
793
794         ## The data now:
795         <server address="192.168.3.201" os="Linux" type="mandrake" version="8.9"/>
796
797         ## Add a new address to the server. Have an ARRAY creation, convertion
798         ## of the previous key to ARRAY:
799         $XML->{server}{address}[1] = '192.168.3.202' ;
800
801         ## The data now:
802         <server os="Linux" type="mandrake" version="8.9">
803           <address>192.168.3.201</address>
804           <address>192.168.3.202</address>
805         </server>
806
807       After create your XML tree you just save it or get the data:
808
809         ## Get the data:
810         my $data = $XML->data ;
811
812         ## Or save it directly:
813         $XML->save('newfile.xml') ;
814
815         ## Or send to a socket:
816         print $socket $XML->data(length => 1) ;
817

BINARY DATA & CDATA

819       From version 1.2 XML::Smart can handle binary data and CDATA blocks
820       automatically.
821
822       When parsing, binary data will be detected as:
823
824         <code dt:dt="binary.base64">f1NPTUUgQklOQVJZIERBVEE=</code>
825
826       Since this is the oficial automatically format for binary data at
827       XML.com <http://www.xml.com/pub/a/98/07/binary/binary.html>.  The
828       content will be decoded from base64 and saved in the object tree.
829
830       CDATA will be parsed as any other content, since CDATA is only a block
831       that won't be parsed.
832
833       When creating XML data, like at $XML->data(), the binary format and
834       CDATA are detected using this roles:
835
836         BINARY:
837         - If have characters that can't be in XML.
838
839         * Characters accepted:
840
841           \s \w \d
842           !"#$%&'()*+,-./:;<=>?@[\]^`{|}~
843           XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
844           AAAA~AeAaCEEEEeIIIIeD‐N~OOOO~OeXOUUUUeYLPssaaaa~aeaaaeceeeeeiiiied`n~oooo~oeXouuuueybpye
845
846         CDATA:
847         - If have tags: <...>
848
849         CONTENT: (<tag>content</tag>)
850         - If have \r\n\t, or ' and " at the same time.
851
852       So, this will be a CDATA content:
853
854         <code><![CDATA[
855           line1
856           <tag_not_parsed>
857           line2
858         ]]></code>
859
860       If a binary content is detected, it will be converted to base64 and a
861       dt:dt attribute added in the tag to tell the format.
862
863         <code dt:dt="binary.base64">f1NPTUUgQklOQVJZIERBVEE=</code>
864

UNICODE and ASCII-extended (ISO-8859-1)

866       XML::Smart support only this 2 encode types, Unicode (UTF-8) and ASCII-
867       extended (ISO-8859-1), and must be enough. (Note that UTF-8 is only
868       supported on Perl-5.8+).
869
870       When creating XML data, if any UTF-8 character is detected the encoding
871       attribute in the <?xml ...?> header will be set to UTF-8:
872
873         <?xml version="1.0" encoding="utf-8" ?>
874         <data>A~X</data>
875
876       If not, the iso-8859-1 is used:
877
878         <?xml version="1.0" encoding="iso-8859-1" ?>
879         <data>X</data>
880
881       When loading XML data with UTF-8, Perl (5.8+) should make all the work
882       internally.
883

PARSING HTML as XML, or BAD XML formats

885       You can use the special parser XML::Smart::HTMLParser to "use" HTML as
886       XML or not well-formed XML data.
887
888       The differences between an normal XML parser and XML::Smart::HTMLParser
889       are:
890
891         - Accept values without quotes:
892           <foo bar=x>
893
894         - Accept any data in the values, including <> and &:
895           <root><echo sample="echo \"Hello!\">out.txt"></root>
896
897         - Accpet URI values without quotes:
898           <link url=http://www.foo.com/dir/file?query?q=v&x=y target=#_blank>
899
900         - Don't need to close the tags adding the '/' before '>':
901           <root><foo bar="1"></root>
902
903           ** Note that the parse will try hard to detect the nodes, and where
904              auto-close or not.
905
906         - Don't need to have only one root:
907           <foo>data</foo><bar>data</bar>
908
909       So, XML::Smart::HTMLParser is a willd way to load markuped data (like
910       HTML), or if you don't want to care with quotes, end tags, etc... when
911       writing by hand your XML data.  So, you can write by hand a bad XML
912       file, load it with XML::Smart::HTMLParser, and rewrite well saving it
913       again! ;-P
914
915       ** Note that <SCRIPT> tags will only parse right if the content is
916       inside comments <!--...-->, since they can have tags:
917
918         <SCRIPT LANGUAGE="JavaScript"><!--
919         document.writeln("some <tag> in the string");
920         --></SCRIPT>
921

ENTITIES

923       Entities (ENTITY) are handled by the parser. So, if you use XML::Parser
924       it will do all the job fine.  But If you use XML::Smart::Parser or
925       XML::Smart::HMLParser, only the basic entities (defaults) will be
926       parsed:
927
928         &lt;   => The less than sign (<).
929         &gt;   => The greater than sign (>).
930         &amp;  => The ampersand (&).
931         &apos; => The single quote or apostrophe (').
932         &quot; => The double quote (").
933
934         &#ddd;  => An ASCII character or an Unicode character (>255). Where ddd is a decimal.
935         &#xHHH; => An Unicode character. Where HHH is in hexadecimal.
936
937       When creating XML data, already existent Entities won't be changed, and
938       the characters '<', '&' and '>' will be converted to the appropriated
939       entity.
940
941       ** Note that if a content have a <tag>, the characters '<' and '>'
942       won't be converted to entities, and this content will be inside a CDATA
943       block.
944

WHY AND HOW IT WORKS

946       Every one that have tried to use Perl HASH and ARRAY to access XML
947       data, like in XML::Simple, have some problems to add new nodes, or to
948       access the node when the user doesn't know if it's inside an ARRAY, a
949       HASH or a HASH key. XML::Smart create around it a very dynamic way to
950       access the data, since at the same time any node/point in the tree can
951       be a HASH and an ARRAY. You also have other extra resources, like a
952       search for nodes by attribute:
953
954         my $server = $XML->{server}('type','eq','suse') ; ## This syntax is not wrong! ;-)
955
956         ## Instead of:
957         my $server = $XML->{server}[1] ;
958
959         __DATA__
960         <hosts>
961           <server os="linux" type="redhat" version="8.0">
962           <server os="linux" type="suse" version="7.0">
963         </hosts>
964
965       The idea for this module, came from the problem that exists to access a
966       complex struture in XML.  You just need to know how is this structure,
967       something that is generally made looking the XML file (what is wrong).
968       But at the same time is hard to always check (by code) the struture,
969       before access it.  XML is a good and easy format to declare your data,
970       but to extrac it in a tree way, at least in my opinion, isn't easy. To
971       fix that, came to my mind a way to access the data with some query
972       language, like SQL.  The first idea was to access using something like:
973
974         XML.foo.bar.baz{arg1}
975
976         X = XML.foo.bar*
977         X.baz{arg1}
978
979         XML.hosts.server[0]{argx}
980
981       And saw that this is very similar to Hashes and Arrays in Perl:
982
983         $XML->{foo}{bar}{baz}{arg1} ;
984
985         $X = $XML->{foo}{bar} ;
986         $X->{baz}{arg1} ;
987
988         $XML->{hosts}{server}[0]{argx} ;
989
990       But the problem of Hash and Array, is not knowing when you have an
991       Array reference or not.  For example, in XML::Simple:
992
993         ## This is very diffenrent
994         $XML->{server}{address} ;
995         ## ... of this:
996         $XML->{server}{address}[0] ;
997
998       So, why don't make both ways work? Because you need to make something
999       crazy!
1000
1001       To create XML::Smart, first I have created the module
1002       Object::MultiType.  With it you can have an object that works at the
1003       same time as a HASH, ARRAY, SCALAR, CODE & GLOB. So you can do things
1004       like this with the same object:
1005
1006         $obj = Object::MultiType->new() ;
1007
1008         $obj->{key} ;
1009         $obj->[0] ;
1010         $obj->method ;
1011
1012         @l = @{$obj} ;
1013         %h = %{$obj} ;
1014
1015         &$obj(args) ;
1016
1017         print $obj "send data\n" ;
1018
1019       Seems to be crazy, and can be more if you use tie() inside it, and this
1020       is what XML::Smart does.
1021
1022       For XML::Smart, the access in the Hash and Array way paste through
1023       tie(). In other words, you have a tied HASH and tied ARRAY inside it.
1024       This tied Hash and Array work together, soo you can access a Hash key
1025       as the index 0 of an Array, or access an index 0 as the Hash key:
1026
1027         %hash = (
1028         key => ['a','b','c']
1029         ) ;
1030
1031         $hash->{key}    ## return $hash{key}[0]
1032         $hash->{key}[0] ## return $hash{key}[0]
1033         $hash->{key}[1] ## return $hash{key}[1]
1034
1035         ## Inverse:
1036
1037         %hash = ( key => 'a' ) ;
1038
1039         $hash->{key}    ## return $hash{key}
1040         $hash->{key}[0] ## return $hash{key}
1041         $hash->{key}[1] ## return undef
1042
1043       The best thing of this new resource is to avoid wrong access to the
1044       data and warnings when you try to access a Hash having an Array (and
1045       the inverse). Thing that generally make the script die().
1046
1047       Once having an easy access to the data, you can use the same resource
1048       to create data!  For example:
1049
1050         ## Previous data:
1051         <hosts>
1052           <server address="192.168.2.100" os="linux" type="conectiva" version="9.0"/>
1053         </hosts>
1054
1055         ## Now you have {address} as a normal key with a string inside:
1056         $XML->{hosts}{server}{address}
1057
1058         ## And to add a new address, the key {address} need to be an ARRAY ref!
1059         ## So, XML::Smart make the convertion: ;-P
1060         $XML->{hosts}{server}{address}[1] = '192.168.2.101' ;
1061
1062         ## Adding to a list that you don't know the size:
1063         push(@{$XML->{hosts}{server}{address}} , '192.168.2.102') ;
1064
1065         ## The data now:
1066         <hosts>
1067           <server os="linux" type="conectiva" version="9.0"/>
1068             <address>192.168.2.100</address>
1069             <address>192.168.2.101</address>
1070             <address>192.168.2.102</address>
1071           </server>
1072         </hosts>
1073
1074       Than after changing your XML tree using the Hash and Array resources
1075       you just get the data remade (through the Hash tree inside the object):
1076
1077         my $xmldata = $XML->data ;
1078
1079       But note that XML::Smart always return an object! Even when you get a
1080       final key. So this actually returns another object, pointhing (inside
1081       it) to the key:
1082
1083         $addr = $XML->{hosts}{server}{address}[0] ;
1084
1085         ## Since $addr is an object you can TRY to access more data:
1086         $addr->{foo}{bar} ; ## This doens't make warnings! just return UNDEF.
1087
1088         ## But you can use it like a normal SCALAR too:
1089
1090         print "$addr\n" ;
1091
1092         $addr .= ':80' ; ## After this $addr isn't an object any more, just a SCALAR!
1093

TODO

1095         * Finish XPath implementation.
1096         * DTD.
1097         * Implement a better way to declare meta tags.
1098

SEE ALSO

1100       XML::Parser, XML::Parser::Lite, XML::XPath, XML.
1101
1102       Object::MultiType - This is the module that make everything possible,
1103       and was created specially for XML::Smart. ;-P
1104
1105       ** See the test.pl script for examples of use.
1106
1107       XML.com <http://www.xml.com>
1108

AUTHOR

1110       Graciliano M. P. <gm@virtuasites.com.br>
1111
1112       I will appreciate any type of feedback (include your opinions and/or
1113       suggestions). ;-P
1114
1115       Enjoy and thanks for who are enjoying this tool and have sent e-mails!
1116       ;-P
1117

THANKS

1119       Thanks to Rusty Allen for the extensive tests of CDATA and BINARY
1120       handling of XML::Smart.
1121
1122       Thanks to Ted Haining to point a Perl-5.8.0 bug for tied keys of a
1123       HASH.
1124
1125       Thanks to everybody that have sent ideas, patches or pointed bugs.
1126
1128       This program is free software; you can redistribute it and/or modify it
1129       under the same terms as Perl itself.
1130
1131
1132
1133perl v5.12.0                      2004-10-30                     XML::Smart(3)
Impressum