1XML::Smart(3) User Contributed Perl Documentation XML::Smart(3)
2
3
4
6 XML::Smart - A smart, easy and powerful way to access/create XML
7 files/data.
8
10 This module has an easy way to access/create XML data. It's based on
11 the HASH tree that is made of the XML data, and enable a dynamic access
12 to it with the Perl syntax for Hash and Array, without needing to care
13 if you have a Hash or an Array in the tree. In other words, each point
14 in the tree work as a Hash and an Array at the same time!
15
16 You also have extra resources, like a search for nodes by attribute,
17 selection of an attribute value in each multiple node, change the
18 returned format, etc...
19
20 The module alson handle automatically binary data (encoding/decoding
21 to/from base64), CDATA (like contents with <tags>) and Unicode. It can
22 be used to create XML files, load XML from the Web (just pasting an URL
23 as a file path) and it has an easy way to send XML data through socket,
24 just adding the length of the data in the <?xml?> header.
25
26 You can use XML::Smart with XML::Parser, or with the 2 standart parsers
27 of XML::Smart:
28
29 XML::Smart::Parser
30 XML::Smart::HTMLParser.
31
32 XML::Smart::HTMLParser can be used to load/parse wild/bad XML data, or
33 HTML tags.
34
36 You can find some extra documents about XML::Smart at:
37
38 XML::Smart::Tutorial - Tutorial and examples for XML::Smart.
39 XML::Smart::FAQ - Frequently Asked Questions about XML::Smart.
40
42 ## Create the object and load the file:
43 my $XML = XML::Smart->new('file.xml') ;
44
45 ## Force the use of the parser 'XML::Smart::Parser'.
46 my $XML = XML::Smart->new('file.xml' , 'XML::Smart::Parser') ;
47
48 ## Get from the web:
49 my $XML = XML::Smart->new('http://www.perlmonks.org/index.pl?node_id=16046') ;
50
51 ## Cut the root:
52 $XML = $XML->cut_root ;
53
54 ## Or change the root:
55 $XML = $XML->{hosts} ;
56
57 ## Get the address [0] of server [0]:
58 my $srv0_addr0 = $XML->{server}[0]{address}[0] ;
59 ## ...or...
60 my $srv0_addr0 = $XML->{server}{address} ;
61
62 ## Get the server where the attibute 'type' eq 'suse':
63 my $server = $XML->{server}('type','eq','suse') ;
64
65 ## Get the address again:
66 my $addr1 = $server->{address}[1] ;
67 ## ...or...
68 my $addr1 = $XML->{server}('type','eq','suse'){address}[1] ;
69
70 ## Get all the addresses of a server:
71 my @addrs = @{$XML->{server}{address}} ;
72 ## ...or...
73 my @addrs = $XML->{server}{address}('@') ;
74
75 ## Get a list of types of all the servers:
76 my @types = $XML->{server}('[@]','type') ;
77
78 ## Add a new server node:
79 my $newsrv = {
80 os => 'Linux' ,
81 type => 'Mandrake' ,
82 version => 8.9 ,
83 address => [qw(192.168.3.201 192.168.3.202)]
84 } ;
85
86 push(@{$XML->{server}} , $newsrv) ;
87
88 ## Get/rebuild the XML data:
89 my $xmldata = $XML->data ;
90
91 ## Save in some file:
92 $XML->save('newfile.xml') ;
93
94 ## Send through a socket:
95 print $socket $XML->data(length => 1) ; ## show the 'length' in the XML header to the
96 ## socket know the amount of data to read.
97
98 __DATA__
99 <?xml version="1.0" encoding="iso-8859-1"?>
100 <hosts>
101 <server os="linux" type="redhat" version="8.0">
102 <address>192.168.0.1</address>
103 <address>192.168.0.2</address>
104 </server>
105 <server os="linux" type="suse" version="7.0">
106 <address>192.168.1.10</address>
107 <address>192.168.1.20</address>
108 </server>
109 <server address="192.168.2.100" os="linux" type="conectiva" version="9.0"/>
110 </hosts>
111
113 new (FILE|DATA|URL , PARSER , OPTIONS)
114 Create a XML object.
115
116 Arguments:
117
118 FILE|DATA|URL
119 The first argument can be:
120
121 - XML data as string.
122 - File path.
123 - File Handle (GLOB).
124 - URL (Need LWP::UserAgent).
125
126 If not paste, a null XML tree is started, where you should
127 create your own XML data, than build/save/send it.
128
129 PARSER (optional)
130 Set the XML parser to use. Options:
131
132 XML::Parser
133 XML::Smart::Parser
134 XML::Smart::HTMLParser
135
136 XML::Smart::Parser can only handle basic XML data (not
137 supported PCDATA, and any header like: ENTITY, NOTATION,
138 etc...), but is a good choice when you don't want to install
139 big modules to parse XML, since it comes with the main
140 module. But it still can handle CDATA and binary data.
141
142 ** See "PARSING HTML as XML" for XML::Smart::HTMLParser.
143
144 Aliases for the options:
145
146 SMART|REGEXP => XML::Smart::Parser
147 HTML => XML::Smart::HTMLParser
148
149 Default:
150
151 If not set it will look for XML::Parser and load it. If
152 XML::Parser can't be loaded it will use XML::Smart::Parser,
153 that actually is a clone of XML::Parser::Lite with some
154 fixes.
155
156 OPTIONS You can force the uper case and lower case for tags (nodes)
157 and arguments (attributes), and other extra things.
158
159 lowtag Make the tags lower case.
160
161 lowarg Make the arguments lower case.
162
163 upertag Make the tags uper case.
164
165 uperarg Make the arguments uper case.
166
167 arg_single
168 Set the value of arguments to 1 when they have a
169 undef value.
170
171 ** This option will work only when the XML is
172 parsed by XML::Smart::HTMLParser, since it accept
173 arguments without values:
174
175 my $xml = new XML::Smart(
176 '<root><foo arg1="" flag></root>' ,
177 'XML::Smart::HTMLParser' ,
178 arg_single => 1 ,
179 ) ;
180
181 In this example the option "arg_single" was used,
182 what will define flag to 1, but arg1 will still
183 have a null string value ("").
184
185 Here's the tree of the example above:
186
187 'root' => {
188 'foo' => {
189 'flag' => 1,
190 'arg1' => ''
191 },
192 },
193
194 use_spaces
195 Accept contents that have only spaces.
196
197 on_start (CODE) *optional
198 Code/sub to call on start a tag.
199
200 ** This will be called after XML::Smart parse the
201 tag, should be used only if you want to change the
202 tree.
203
204 on_char (CODE) *optional
205 Code/sub to call on content.
206
207 ** This will be called after XML::Smart parse the
208 tag, should be used only if you want to change the
209 tree.
210
211 on_end (CODE) *optional
212 Code/sub to call on end a tag.
213
214 ** This will be called after XML::Smart parse the
215 tag, should be used only if you want to change the
216 tree.
217
218 ** This options are applied when the XML data is loaded. For
219 XML generation see data() OPTIONS.
220
221 Examples of use:
222
223 my $xml_from_url = XML::Smart->new("http://www.perlmonks.org/index.pl?node_id=16046") ;
224
225 ...
226
227 my $xml_from_str = XML::Smart->new(q`<?xml version="1.0" encoding="iso-8859-1" ?>
228 <root>
229 <foo arg="xyz"/>
230 </root>
231 `) ;
232
233 ...
234
235 my $null_xml = XML::Smart->new() ;
236
237 ...
238
239 my $xml_from_html = XML::Smart->new($html_data , 'html' ,
240 lowtag => 1 ,
241 lowarg => 1 ,
242 on_char => sub {
243 my ( $tag , $pointer , $pointer_back , $cont) = @_ ;
244 $pointer->{extra_arg} = 123 ; ## add an extrar argument.
245 $pointer_back->{$tag}{extra_arg} = 123 ; ## Same, but using the previous pointer.
246 $$cont .= "\n" ; ## append data to the content.
247 }
248 ) ;
249
250 apply_dtd (DTD , OPTIONS)
251 Apply the DTD to the XML tree.
252
253 DTD can be a source, file, GLOB or URL.
254
255 This method is usefull if you need to have the XML generated by data()
256 formated in a specific DTD, so, elements will be nodes automatically,
257 attributes will be checked, required elements and attributes will be
258 created, the element order will be set, etc...
259
260 OPTIONS:
261
262 no_delete BOOL
263 If TRUE tells that not defined elements and attributes in the
264 DTD won't be deleted from the XML tree.
265
266 Example of use:
267
268 $xml->apply_dtd(q`
269 <!DOCTYPE cds [
270 <!ELEMENT cds (album+)>
271 <!ATTLIST cds
272 creator CDATA
273 date CDATA #REQUIRED
274 type (a|b|c) #REQUIRED "a"
275 >
276 <!ELEMENT album (#PCDATA)>
277 ]>
278 ` ,
279 no_delete => 1 ,
280 );
281
282 args()
283 Return the arguments names (not nodes).
284
285 args_values()
286 Return the arguments values (not nodes).
287
288 back()
289 Get back one level the pointer in the tree.
290
291 ** Se base().
292
293 base()
294 Get back to the base of the tree.
295
296 Each query to the XML::Smart object return an object pointing to a
297 different place in the tree (and share the same HASH tree). So, you can
298 get the main object again (an object that points to the base):
299
300 my $srv = $XML->{root}{host}{server} ;
301 my $addr = $srv->{adress} ;
302 my $XML2 = $srv->base() ;
303 $XML2->{root}{hosts}...
304
305 content()
306 Return the content of a node:
307
308 ## Data:
309 <foo>my content</foo>
310
311 ## Access:
312
313 my $content = $XML->{foo}->content ;
314 print "<<$content>>\n" ; ## show: <<my content>>
315
316 ## or just:
317 my $content = $XML->{foo} ;
318
319 Also can be used with multiple contents:
320
321 For this XML data:
322
323 <root>
324 content0
325 <tag1 arg="1"/>
326 content1
327 </root>
328
329 Getting all the content:
330
331 my $all_content = $XML->{root}->content ;
332 print "[$all_content]\n" ;
333
334 Output:
335
336 [
337 content0
338
339 content1
340 ]
341
342 Getting in parts:
343
344 my @contents = $XML->{root}->content ;
345 print "[@contents[0]]\n" ;
346 print "[@contents[1]]\n" ;
347
348 Output
349
350 [
351 content0
352 ]
353 [
354 content1
355 ]
356
357 Setting multiple contents:
358
359 $XML->{root}->content(0,"aaaaa") ;
360 $XML->{root}->content(1,"bbbbb") ;
361
362 Output now will be:
363
364 [aaaaa]
365 [bbbbb]
366
367 And now the XML data generated will be:
368
369 <root>aaaaa<tag1 arg="1"/>bbbbb</root>
370
371 copy()
372 Return a copy of the XML::Smart object (pointing to the base).
373
374 ** This is good when you want to keep 2 versions of the same XML tree
375 in the memory, since one object can't change the tree of the other!
376
377 cut_root()
378 Cut the root key:
379
380 my $srv = $XML->{rootx}{host}{server} ;
381
382 ## Or if you don't know the root name:
383 $XML = $XML->cut_root() ;
384 my $srv = $XML->{host}{server} ;
385
386 ** Note that this will cut the root of the pointer in the tree. So, if
387 you are in some place that have more than one key (multiple roots), the
388 same object will be retuned without cut anything.
389
390 data (OPTIONS)
391 Return the data of the XML object (rebuilding it).
392
393 Options:
394
395 nodtd Do not add in the XML content the DTD applied by the method
396 apply_dtd().
397
398 noident If set to true the data isn't idented.
399
400 nospace If set to true the data isn't idented and doesn't have space
401 between the tags (unless the CONTENT have).
402
403 lowtag Make the tags lower case.
404
405 lowarg Make the arguments lower case.
406
407 upertag Make the tags uper case.
408
409 uperarg Make the arguments uper case.
410
411 length If set true, add the attribute 'length' with the size of the
412 data to the xml header (<?xml ...?>). This is useful when
413 you send the data through a socket, since the socket can know
414 the total amount of data to read.
415
416 noheader Do not add the <?xml ...?> header.
417
418 nometagen Do not add the meta generator tag: <?meta
419 generator="XML::Smart" ?>
420
421 meta Set the meta tags of the XML document.
422
423 Examples:
424
425 my $meta = {
426 build_from => "wxWindows 2.4.0" ,
427 file => "wx26.htm" ,
428 } ;
429
430 print $XML->data( meta => $meta ) ;
431
432 __DATA__
433 <?meta build_from="wxWindows 2.4.0" file="wx283.htm" ?>
434
435 Multiple meta:
436
437 my $meta = [
438 {build_from => "wxWindows 2.4.0" , file => "wx26.htm" } ,
439 {script => "genxml.pl" , ver => "1.0" } ,
440 ] ;
441
442 __DATA__
443 <?meta build_from="wxWindows 2.4.0" file="wx26.htm" ?>
444 <?meta script="genxml.pl" ver="1.0" ?>
445
446 Or set directly the meta tag:
447
448 my $meta = '<?meta foo="bar" ?>' ;
449
450 ## For multiple:
451 my $meta = ['<?meta foo="bar" ?>' , '<?meta x="1" ?>'] ;
452
453 print $XML->data( meta => $meta ) ;
454
455 tree Set the HASH tree to parse. If not set will use the tree of
456 the XML::Smart object (tree()). ;
457
458 wild Accept wild tags and arguments.
459
460 ** This wont fix wrong keys and tags.
461
462 sortall Sort all the tags alphabetically. If not set will keep the
463 order of the document loaded, or the order of tag creation.
464 Default: off
465
466 data_pointer (OPTIONS)
467 Make the tree from current point in the XML tree (not from the base as
468 data()).
469
470 Accept the same OPTIONS of the method data().
471
472 dump_tree()
473 Dump the tree of the object using Data::Dumper.
474
475 dump_tree_pointer()
476 Dump the tree of the object, from the pointer, using Data::Dumper.
477
478 dump_pointer()
479 ** Same as dump_tree_pointer().
480
481 i()
482 Return the index of the value.
483
484 ** If the value is from an hash key (not an ARRAY ref) undef is
485 returned.
486
487 is_node()
488 Return if a key is a node.
489
490 key()
491 Return the key of the value.
492
493 If wantarray return the index too: return(KEY , I) ;
494
495 nodes()
496 Return the nodes (objects) in the pointer (keys that aren't arguments).
497
498 nodes_keys()
499 Return the nodes names (not the object) in the pointer (keys that
500 aren't arguments).
501
502 null()
503 Return true if the XML object has a null tree or if the pointer is in
504 some place that doesn't exist.
505
506 order()
507 Return the order of the keys. See set_order().
508
509 path()
510 Return the path of the pointer.
511
512 Example:
513
514 /hosts/server[1]/address[0]
515
516 Note that the index is 0 based and 'address' can be an attribute or a
517 node, what is not compatible with XPath.
518
519 ** See path_as_xpath().
520
521 path_as_xpath()
522 Return the path of the pointer in the XPath format.
523
524 pointer
525 Return the HASH tree from the pointer.
526
527 pointer_ok
528 Return a copy of the tree of the object, from the pointer, but without
529 internal keys added by XML::Smart.
530
531 root
532 Return the ROOT name of the XML tree (main key).
533
534 ** See also key() for sub nodes.
535
536 save (FILEPATH , OPTIONS)
537 Save the XML data inside a file.
538
539 Accept the same OPTIONS of the method data().
540
541 set_auto
542 Define the key to be handled automatically. Soo, data() will define
543 automatically if it's a node, content or attribute.
544
545 ** This method is useful to remove set_node(), set_cdata() and
546 set_binary() changes.
547
548 set_auto_node
549 Define the key as a node, and data() will define automatically if it's
550 CDATA or BINARY.
551
552 ** This method is useful to remove set_cdata() and set_binary()
553 changes.
554
555 set_binary(BOOL)
556 Define the node as a BINARY content when TRUE, or force to not handle
557 it as a BINARY on FALSE.
558
559 Example of node handled as BINARY:
560
561 <root><foo dt:dt="binary.base64">PGgxPnRlc3QgAzwvaDE+</foo></root>
562
563 Original content of foo (the base64 data):
564
565 <h1>test \x03</h1>
566
567 set_cdata(BOOL)
568 Define the node as CDATA when TRUE, or force to not handle it as CDATA
569 on FALSE.
570
571 Example of CDATA node:
572
573 <root><foo><![CDATA[bla bla bla <tag> bla bla]]></foo></root>
574
575 set_node(BOOL)
576 Set/unset the current key as a node (tag).
577
578 ** If BOOL is not defined will use TRUE.
579
580 set_order(KEYS)
581 Set the order of the keys (nodes and attributes) in this point.
582
583 set_tag
584 Same as set_node.
585
586 tree()
587 Return the HASH tree of the XML data.
588
589 ** Note that the real HASH tree is returned here. All the other ways
590 return an object that works like a HASH/ARRAY through tie.
591
592 tree_pointer()
593 Same as pointer().
594
595 tree_ok()
596 Return a copy of the tree of the object, but without internal keys
597 added by XML::Smart, like /order and /nodes.
598
599 tree_pointer_ok()
600 Return a copy of the tree of the object, from the pointer, but without
601 internal keys added by XML::Smart.
602
603 xpath() || XPath()
604 Return a XML::XPath object, based in the XML root in the tree.
605
606 ## look from the root:
607 my $data = $XML->XPath->findnodes_as_string('/') ;
608
609 ** Need XML::XPath installed, but only load when is needed.
610
611 xpath_pointer() || XPath_pointer()
612 Return a XML::XPath object, based in the XML::Smart pointer in the
613 tree.
614
615 ## look from this point, soo XPath '/' actually starts at /server/:
616
617 my $srvs = $XML->{server} ;
618 my $data = $srvs->XPath_pointer->findnodes_as_string('/') ;
619
620 ** Need XML::XPath installed, but only load when is needed.
621
623 To access the data you use the object in a way similar to HASH and
624 ARRAY:
625
626 my $XML = XML::Smart->new('file.xml') ;
627
628 my $server = $XML->{server} ;
629
630 But when you get a key {server}, you are actually accessing the data
631 through tie(), not directly to the HASH tree inside the object, (This
632 will fix wrong accesses):
633
634 ## {server} is a normal key, not an ARRAY ref:
635
636 my $server = $XML->{server}[0] ; ## return $XML->{server}
637 my $server = $XML->{server}[1] ; ## return UNDEF
638
639 ## {server} has an ARRAY with 2 items:
640
641 my $server = $XML->{server} ; ## return $XML->{server}[0]
642 my $server = $XML->{server}[0] ; ## return $XML->{server}[0]
643 my $server = $XML->{server}[1] ; ## return $XML->{server}[1]
644
645 To get all the values of multiple elements/keys:
646
647 ## This work having only a string inside {address}, or with an ARRAY ref:
648 my @addrsses = @{$XML->{server}{address}} ;
649
650 Select search
651 When you don't know the position of the nodes, you can select it by
652 some attribute value:
653
654 my $server = $XML->{server}('type','eq','suse') ; ## return $XML->{server}[1]
655
656 Syntax for the select search:
657
658 (NAME, CONDITION , VALUE)
659
660 NAME The attribute name in the node (tag).
661
662 CONDITION Can be
663
664 eq ne == != <= >= < >
665
666 For REGEX:
667
668 =~ !~
669
670 ## Case insensitive:
671 =~i !~i
672
673 VALUE The value.
674
675 For REGEX use like this:
676
677 $XML->{server}('type','=~','^s\w+$') ;
678
679 Select attributes in multiple nodes:
680 You can get the list of values of an attribute looking in all multiple
681 nodes:
682
683 ## Get all the server types:
684 my @types = $XML->{server}('[@]','type') ;
685
686 Also as:
687
688 my @types = $XML->{server}{type}('<@') ;
689
690 Without the resource:
691
692 my @list ;
693 my @servers = @{$XML->{server}} ;
694
695 foreach my $servers_i ( @servers ) {
696 push(@list , $servers_i->{type} ) ;
697 }
698
699 Return format
700 You can change the returned format:
701
702 Syntax:
703
704 (TYPE)
705
706 Where TYPE can be:
707
708 $ ## the content.
709 @ ## an array (list of multiple values).
710 % ## a hash.
711 . ## The exact point in the tree, not an object.
712
713 $@ ## an array, but with the content, not an objects.
714 $% ## a hash, but the values are the content, not an object.
715
716 ## The use of $@ and $% is good if you don't want to keep the object
717 ## reference (and save memory).
718
719 @keys ## The keys of the node. note that if you have a key with
720 ## multiple nodes, it will be replicated (this is the
721 ## difference of "keys %{$this->{node}}" ).
722
723 <@ ## Return the attribute in the previous node, but looking for
724 ## multiple nodes. Example:
725
726 my @names = $this->{method}{wxFrame}{arg}{name}('<@') ;
727 #### @names = (parent , id , title) ;
728
729 <xml> ## Return a XML data from this point.
730
731 __DATA__
732 <method>
733 <wxFrame return="wxFrame">
734 <arg name="parent" type="wxWindow" />
735 <arg name="id" type="wxWindowID" />
736 <arg name="title" type="wxString" />
737 </wxFrame>
738 </method>
739
740 Example:
741
742 ## A servers content
743 my $name = $XML->{server}{name}('$') ;
744 ## ... or:
745 my $name = $XML->{server}{name}->content ;
746 ## ... or:
747 my $name = $XML->{server}{name} ;
748 $name = "$name" ;
749
750 ## All the servers
751 my @servers = $XML->{server}('@') ;
752 ## ... or:
753 my @servers = @{$XML->{server}} ;
754
755 ## It still has the object reference:
756 @servers[0]->{name} ;
757
758 ## Without the reference:
759 my @servers = $XML->{server}('$@') ;
760
761 ## A XML data, same as data_pointer():
762 my $xml_data = $XML->{server}('<xml>') ;
763
764 CONTENT
765 If a {key} has a content you can access it directly from the variable
766 or from the method:
767
768 my $server = $XML->{server} ;
769
770 print "Content: $server\n" ;
771 ## ...or...
772 print "Content: ". $server->content ."\n" ;
773
774 So, if you use the object as a string it works as a string, if you use
775 as an object it works as an object! ;-P
776
777 **See the method content() for more.
778
780 To create XML data is easy, you just use as a normal HASH, but you
781 don't need to care with multiple nodes, and ARRAY creation/convertion!
782
783 ## Create a null XML object:
784 my $XML = XML::Smart->new() ;
785
786 ## Add a server to the list:
787 $XML->{server} = {
788 os => 'Linux' ,
789 type => 'mandrake' ,
790 version => 8.9 ,
791 address => '192.168.3.201' ,
792 } ;
793
794 ## The data now:
795 <server address="192.168.3.201" os="Linux" type="mandrake" version="8.9"/>
796
797 ## Add a new address to the server. Have an ARRAY creation, convertion
798 ## of the previous key to ARRAY:
799 $XML->{server}{address}[1] = '192.168.3.202' ;
800
801 ## The data now:
802 <server os="Linux" type="mandrake" version="8.9">
803 <address>192.168.3.201</address>
804 <address>192.168.3.202</address>
805 </server>
806
807 After create your XML tree you just save it or get the data:
808
809 ## Get the data:
810 my $data = $XML->data ;
811
812 ## Or save it directly:
813 $XML->save('newfile.xml') ;
814
815 ## Or send to a socket:
816 print $socket $XML->data(length => 1) ;
817
819 From version 1.2 XML::Smart can handle binary data and CDATA blocks
820 automatically.
821
822 When parsing, binary data will be detected as:
823
824 <code dt:dt="binary.base64">f1NPTUUgQklOQVJZIERBVEE=</code>
825
826 Since this is the oficial automatically format for binary data at
827 XML.com <http://www.xml.com/pub/a/98/07/binary/binary.html>. The
828 content will be decoded from base64 and saved in the object tree.
829
830 CDATA will be parsed as any other content, since CDATA is only a block
831 that won't be parsed.
832
833 When creating XML data, like at $XML->data(), the binary format and
834 CDATA are detected using this roles:
835
836 BINARY:
837 - If have characters that can't be in XML.
838
839 * Characters accepted:
840
841 \s \w \d
842 !"#$%&'()*+,-./:;<=>?@[\]^`{|}~
843 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
844 AAAA~AeAaCEEEEeIIIIeD‐N~OOOO~OeXOUUUUeYLPssaaaa~aeaaaeceeeeeiiiied`n~oooo~oeXouuuueybpye
845
846 CDATA:
847 - If have tags: <...>
848
849 CONTENT: (<tag>content</tag>)
850 - If have \r\n\t, or ' and " at the same time.
851
852 So, this will be a CDATA content:
853
854 <code><![CDATA[
855 line1
856 <tag_not_parsed>
857 line2
858 ]]></code>
859
860 If a binary content is detected, it will be converted to base64 and a
861 dt:dt attribute added in the tag to tell the format.
862
863 <code dt:dt="binary.base64">f1NPTUUgQklOQVJZIERBVEE=</code>
864
866 XML::Smart support only this 2 encode types, Unicode (UTF-8) and ASCII-
867 extended (ISO-8859-1), and must be enough. (Note that UTF-8 is only
868 supported on Perl-5.8+).
869
870 When creating XML data, if any UTF-8 character is detected the encoding
871 attribute in the <?xml ...?> header will be set to UTF-8:
872
873 <?xml version="1.0" encoding="utf-8" ?>
874 <data>A~X</data>
875
876 If not, the iso-8859-1 is used:
877
878 <?xml version="1.0" encoding="iso-8859-1" ?>
879 <data>X</data>
880
881 When loading XML data with UTF-8, Perl (5.8+) should make all the work
882 internally.
883
885 You can use the special parser XML::Smart::HTMLParser to "use" HTML as
886 XML or not well-formed XML data.
887
888 The differences between an normal XML parser and XML::Smart::HTMLParser
889 are:
890
891 - Accept values without quotes:
892 <foo bar=x>
893
894 - Accept any data in the values, including <> and &:
895 <root><echo sample="echo \"Hello!\">out.txt"></root>
896
897 - Accpet URI values without quotes:
898 <link url=http://www.foo.com/dir/file?query?q=v&x=y target=#_blank>
899
900 - Don't need to close the tags adding the '/' before '>':
901 <root><foo bar="1"></root>
902
903 ** Note that the parse will try hard to detect the nodes, and where
904 auto-close or not.
905
906 - Don't need to have only one root:
907 <foo>data</foo><bar>data</bar>
908
909 So, XML::Smart::HTMLParser is a willd way to load markuped data (like
910 HTML), or if you don't want to care with quotes, end tags, etc... when
911 writing by hand your XML data. So, you can write by hand a bad XML
912 file, load it with XML::Smart::HTMLParser, and rewrite well saving it
913 again! ;-P
914
915 ** Note that <SCRIPT> tags will only parse right if the content is
916 inside comments <!--...-->, since they can have tags:
917
918 <SCRIPT LANGUAGE="JavaScript"><!--
919 document.writeln("some <tag> in the string");
920 --></SCRIPT>
921
923 Entities (ENTITY) are handled by the parser. So, if you use XML::Parser
924 it will do all the job fine. But If you use XML::Smart::Parser or
925 XML::Smart::HMLParser, only the basic entities (defaults) will be
926 parsed:
927
928 < => The less than sign (<).
929 > => The greater than sign (>).
930 & => The ampersand (&).
931 ' => The single quote or apostrophe (').
932 " => The double quote (").
933
934 &#ddd; => An ASCII character or an Unicode character (>255). Where ddd is a decimal.
935 &#xHHH; => An Unicode character. Where HHH is in hexadecimal.
936
937 When creating XML data, already existent Entities won't be changed, and
938 the characters '<', '&' and '>' will be converted to the appropriated
939 entity.
940
941 ** Note that if a content have a <tag>, the characters '<' and '>'
942 won't be converted to entities, and this content will be inside a CDATA
943 block.
944
946 Every one that have tried to use Perl HASH and ARRAY to access XML
947 data, like in XML::Simple, have some problems to add new nodes, or to
948 access the node when the user doesn't know if it's inside an ARRAY, a
949 HASH or a HASH key. XML::Smart create around it a very dynamic way to
950 access the data, since at the same time any node/point in the tree can
951 be a HASH and an ARRAY. You also have other extra resources, like a
952 search for nodes by attribute:
953
954 my $server = $XML->{server}('type','eq','suse') ; ## This syntax is not wrong! ;-)
955
956 ## Instead of:
957 my $server = $XML->{server}[1] ;
958
959 __DATA__
960 <hosts>
961 <server os="linux" type="redhat" version="8.0">
962 <server os="linux" type="suse" version="7.0">
963 </hosts>
964
965 The idea for this module, came from the problem that exists to access a
966 complex struture in XML. You just need to know how is this structure,
967 something that is generally made looking the XML file (what is wrong).
968 But at the same time is hard to always check (by code) the struture,
969 before access it. XML is a good and easy format to declare your data,
970 but to extrac it in a tree way, at least in my opinion, isn't easy. To
971 fix that, came to my mind a way to access the data with some query
972 language, like SQL. The first idea was to access using something like:
973
974 XML.foo.bar.baz{arg1}
975
976 X = XML.foo.bar*
977 X.baz{arg1}
978
979 XML.hosts.server[0]{argx}
980
981 And saw that this is very similar to Hashes and Arrays in Perl:
982
983 $XML->{foo}{bar}{baz}{arg1} ;
984
985 $X = $XML->{foo}{bar} ;
986 $X->{baz}{arg1} ;
987
988 $XML->{hosts}{server}[0]{argx} ;
989
990 But the problem of Hash and Array, is not knowing when you have an
991 Array reference or not. For example, in XML::Simple:
992
993 ## This is very diffenrent
994 $XML->{server}{address} ;
995 ## ... of this:
996 $XML->{server}{address}[0] ;
997
998 So, why don't make both ways work? Because you need to make something
999 crazy!
1000
1001 To create XML::Smart, first I have created the module
1002 Object::MultiType. With it you can have an object that works at the
1003 same time as a HASH, ARRAY, SCALAR, CODE & GLOB. So you can do things
1004 like this with the same object:
1005
1006 $obj = Object::MultiType->new() ;
1007
1008 $obj->{key} ;
1009 $obj->[0] ;
1010 $obj->method ;
1011
1012 @l = @{$obj} ;
1013 %h = %{$obj} ;
1014
1015 &$obj(args) ;
1016
1017 print $obj "send data\n" ;
1018
1019 Seems to be crazy, and can be more if you use tie() inside it, and this
1020 is what XML::Smart does.
1021
1022 For XML::Smart, the access in the Hash and Array way paste through
1023 tie(). In other words, you have a tied HASH and tied ARRAY inside it.
1024 This tied Hash and Array work together, soo you can access a Hash key
1025 as the index 0 of an Array, or access an index 0 as the Hash key:
1026
1027 %hash = (
1028 key => ['a','b','c']
1029 ) ;
1030
1031 $hash->{key} ## return $hash{key}[0]
1032 $hash->{key}[0] ## return $hash{key}[0]
1033 $hash->{key}[1] ## return $hash{key}[1]
1034
1035 ## Inverse:
1036
1037 %hash = ( key => 'a' ) ;
1038
1039 $hash->{key} ## return $hash{key}
1040 $hash->{key}[0] ## return $hash{key}
1041 $hash->{key}[1] ## return undef
1042
1043 The best thing of this new resource is to avoid wrong access to the
1044 data and warnings when you try to access a Hash having an Array (and
1045 the inverse). Thing that generally make the script die().
1046
1047 Once having an easy access to the data, you can use the same resource
1048 to create data! For example:
1049
1050 ## Previous data:
1051 <hosts>
1052 <server address="192.168.2.100" os="linux" type="conectiva" version="9.0"/>
1053 </hosts>
1054
1055 ## Now you have {address} as a normal key with a string inside:
1056 $XML->{hosts}{server}{address}
1057
1058 ## And to add a new address, the key {address} need to be an ARRAY ref!
1059 ## So, XML::Smart make the convertion: ;-P
1060 $XML->{hosts}{server}{address}[1] = '192.168.2.101' ;
1061
1062 ## Adding to a list that you don't know the size:
1063 push(@{$XML->{hosts}{server}{address}} , '192.168.2.102') ;
1064
1065 ## The data now:
1066 <hosts>
1067 <server os="linux" type="conectiva" version="9.0"/>
1068 <address>192.168.2.100</address>
1069 <address>192.168.2.101</address>
1070 <address>192.168.2.102</address>
1071 </server>
1072 </hosts>
1073
1074 Than after changing your XML tree using the Hash and Array resources
1075 you just get the data remade (through the Hash tree inside the object):
1076
1077 my $xmldata = $XML->data ;
1078
1079 But note that XML::Smart always return an object! Even when you get a
1080 final key. So this actually returns another object, pointhing (inside
1081 it) to the key:
1082
1083 $addr = $XML->{hosts}{server}{address}[0] ;
1084
1085 ## Since $addr is an object you can TRY to access more data:
1086 $addr->{foo}{bar} ; ## This doens't make warnings! just return UNDEF.
1087
1088 ## But you can use it like a normal SCALAR too:
1089
1090 print "$addr\n" ;
1091
1092 $addr .= ':80' ; ## After this $addr isn't an object any more, just a SCALAR!
1093
1095 * Finish XPath implementation.
1096 * DTD.
1097 * Implement a better way to declare meta tags.
1098
1100 XML::Parser, XML::Parser::Lite, XML::XPath, XML.
1101
1102 Object::MultiType - This is the module that make everything possible,
1103 and was created specially for XML::Smart. ;-P
1104
1105 ** See the test.pl script for examples of use.
1106
1107 XML.com <http://www.xml.com>
1108
1110 Graciliano M. P. <gm@virtuasites.com.br>
1111
1112 I will appreciate any type of feedback (include your opinions and/or
1113 suggestions). ;-P
1114
1115 Enjoy and thanks for who are enjoying this tool and have sent e-mails!
1116 ;-P
1117
1119 Thanks to Rusty Allen for the extensive tests of CDATA and BINARY
1120 handling of XML::Smart.
1121
1122 Thanks to Ted Haining to point a Perl-5.8.0 bug for tied keys of a
1123 HASH.
1124
1125 Thanks to everybody that have sent ideas, patches or pointed bugs.
1126
1128 This program is free software; you can redistribute it and/or modify it
1129 under the same terms as Perl itself.
1130
1131
1132
1133perl v5.12.0 2004-10-30 XML::Smart(3)