1XML::Smart(3) User Contributed Perl Documentation XML::Smart(3)
2
3
4
6 XML::Smart - A smart, easy and powerful way to access or create XML
7 from fiels, data and URLs.
8
10 Version 1.79
11
13 This module provides an easy way to access/create XML data. It's based
14 on a HASH tree created from the XML data, and enables dynamic access to
15 it through the standard Perl syntax for Hash and Array, without
16 necessarily caring about which you are working with. In other words,
17 each point in the tree works as a Hash and an Array at the same time!
18
19 This module additionally provides special resources such as: search for
20 nodes by attribute, select an attribute value in each multiple node,
21 change the returned format, and so on.
22
23 The module also automatically handles binary data (encoding/decoding
24 to/from base64), CDATA (like contents with <tags>) and Unicode. It can
25 be used to create XML files, load XML from the Web ( just by using an
26 URL as the file path ) and has an easy way to send XML data through
27 sockets - just adding the length of the data in the <?xml?> header.
28
29 You can use XML::Smart with XML::Parser, or with the 2 standard parsers
30 of XML::Smart:
31
32 XML::Smart::Parser
33 XML::Smart::HTMLParser.
34
35 XML::Smart::HTMLParser can be used to load/parse wild/bad XML data, or
36 HTML tags.
37
39 You can find some extra documents about XML::Smart at:
40
41 XML::Smart::Tutorial - Tutorial and examples for XML::Smart.
42 XML::Smart::FAQ - Frequently Asked Questions about XML::Smart.
43
45 ## Create the object and load the file:
46 my $XML = XML::Smart->new('file.xml') ;
47
48 ## Force the use of the parser 'XML::Smart::Parser'.
49 my $XML = XML::Smart->new('file.xml' , 'XML::Smart::Parser') ;
50
51 ## Get from the web:
52 my $XML = XML::Smart->new('http://www.perlmonks.org/index.pl?node_id=16046') ;
53
54 ## Cut the root:
55 $XML = $XML->cut_root ;
56
57 ## Or change the root:
58 $XML = $XML->{hosts} ;
59
60 ## Get the address [0] of server [0]:
61 my $srv0_addr0 = $XML->{server}[0]{address}[0] ;
62 ## ...or...
63 my $srv0_addr0 = $XML->{server}{address} ;
64
65 ## Get the server where the attibute 'type' eq 'suse':
66 my $server = $XML->{server}('type','eq','suse') ;
67
68 ## Get the address again:
69 my $addr1 = $server->{address}[1] ;
70 ## ...or...
71 my $addr1 = $XML->{server}('type','eq','suse'){address}[1] ;
72
73 ## Get all the addresses of a server:
74 my @addrs = @{$XML->{server}{address}} ;
75 ## ...or...
76 my @addrs = $XML->{server}{address}('@') ;
77
78 ## Get a list of types of all the servers:
79 my @types = $XML->{server}('[@]','type') ;
80
81 ## Add a new server node:
82 my $newsrv = {
83 os => 'Linux' ,
84 type => 'Mandrake' ,
85 version => 8.9 ,
86 address => [qw(192.168.3.201 192.168.3.202)]
87 } ;
88
89 push(@{$XML->{server}} , $newsrv) ;
90
91 ## Get/rebuild the XML data:
92 my $xmldata = $XML->data ;
93
94 ## Save in some file:
95 $XML->save('newfile.xml') ;
96
97 ## Send through a socket:
98 print $socket $XML->data(length => 1) ; ## show the 'length' in the XML header to the
99 ## socket know the amount of data to read.
100
101 __DATA__
102 <?xml version="1.0" encoding="iso-8859-1"?>
103 <hosts>
104 <server os="linux" type="redhat" version="8.0">
105 <address>192.168.0.1</address>
106 <address>192.168.0.2</address>
107 </server>
108 <server os="linux" type="suse" version="7.0">
109 <address>192.168.1.10</address>
110 <address>192.168.1.20</address>
111 </server>
112 <server address="192.168.2.100" os="linux" type="conectiva" version="9.0"/>
113 </hosts>
114
116 new (FILE|DATA|URL , PARSER , OPTIONS)
117 Create a XML object.
118
119 Arguments:
120
121 FILE|DATA|URL
122 The first argument can be:
123
124 - XML data as string.
125 - File path.
126 - File Handle (GLOB).
127 - URL (Need LWP::UserAgent).
128
129 If not passed, a null XML tree is started, where you should
130 create your own XML data, than build/save/send it.
131
132 PARSER (optional)
133 Set the XML parser to use. Options:
134
135 XML::Parser
136 XML::Smart::Parser
137 XML::Smart::HTMLParser
138
139 XML::Smart::Parser can only handle basic XML data (not
140 supported PCDATA, and any header like: ENTITY, NOTATION,
141 etc...), but is a good choice when you don't want to install
142 big modules to parse XML, since it comes with the main
143 module. But it still can handle CDATA and binary data.
144
145 ** See "PARSING HTML as XML" for XML::Smart::HTMLParser.
146
147 Aliases for the options:
148
149 SMART|REGEXP => XML::Smart::Parser
150 HTML => XML::Smart::HTMLParser
151
152 Default:
153
154 If not set it will look for XML::Parser and load it. If
155 XML::Parser can't be loaded it will use XML::Smart::Parser,
156 which is actually a clone of XML::Parser::Lite with some
157 fixes.
158
159 OPTIONS You can force the uper case and lower case for tags (nodes)
160 and arguments (attributes), and other extra things.
161
162 lowtag Make the tags lower case.
163
164 lowarg Make the arguments lower case.
165
166 upertag Make the tags uper case.
167
168 uperarg Make the arguments uper case.
169
170 arg_single
171 Set the value of arguments to 1 when they have a
172 undef value.
173
174 ** This option will work only when the XML is
175 parsed by XML::Smart::HTMLParser, since it accept
176 arguments without values:
177
178 my $xml = new XML::Smart(
179 '<root><foo arg1="" flag></root>' ,
180 'XML::Smart::HTMLParser' ,
181 arg_single => 1 ,
182 ) ;
183
184 In this example the option "arg_single" was used,
185 what will define flag to 1, but arg1 will still
186 have a null string value ("").
187
188 Here's the tree of the example above:
189
190 'root' => {
191 'foo' => {
192 'flag' => 1,
193 'arg1' => ''
194 },
195 },
196
197 use_spaces
198 Accept contents that have only spaces.
199
200 on_start (CODE) *optional
201 Code/sub to call on start a tag.
202
203 ** This will be called after XML::Smart parse the
204 tag, should be used only if you want to change the
205 tree.
206
207 on_char (CODE) *optional
208 Code/sub to call on content.
209
210 ** This will be called after XML::Smart parse the
211 tag, should be used only if you want to change the
212 tree.
213
214 on_end (CODE) *optional
215 Code/sub to call on end a tag.
216
217 ** This will be called after XML::Smart parse the
218 tag, should be used only if you want to change the
219 tree.
220
221 ** This options are applied when the XML data is loaded. For
222 XML generation see data() OPTIONS.
223
224 Examples of use:
225
226 my $xml_from_url = XML::Smart->new("http://www.perlmonks.org/index.pl?node_id=16046") ;
227
228 ...
229
230 my $xml_from_str = XML::Smart->new(q`<?xml version="1.0" encoding="iso-8859-1" ?>
231 <root>
232 <foo arg="xyz"/>
233 </root>
234 `) ;
235
236 ...
237
238 my $null_xml = XML::Smart->new() ;
239
240 ...
241
242 my $xml_from_html = XML::Smart->new($html_data , 'html' ,
243 lowtag => 1 ,
244 lowarg => 1 ,
245 on_char => sub {
246 my ( $tag , $pointer , $pointer_back , $cont) = @_ ;
247 $pointer->{extra_arg} = 123 ; ## add an extrar argument.
248 $pointer_back->{$tag}{extra_arg} = 123 ; ## Same, but using the previous pointer.
249 $$cont .= "\n" ; ## append data to the content.
250 }
251 ) ;
252
253 apply_dtd (DTD , OPTIONS)
254 Apply the DTD to the XML tree.
255
256 DTD can be a source, file, GLOB or URL.
257
258 This method is usefull if you need to have the XML generated by data()
259 formated in a specific DTD, so, elements will be nodes automatically,
260 attributes will be checked, required elements and attributes will be
261 created, the element order will be set, etc...
262
263 OPTIONS:
264
265 no_delete BOOL
266 If TRUE tells that not defined elements and attributes in the
267 DTD won't be deleted from the XML tree.
268
269 Example of use:
270
271 $xml->apply_dtd(q`
272 <!DOCTYPE cds [
273 <!ELEMENT cds (album+)>
274 <!ATTLIST cds
275 creator CDATA
276 date CDATA #REQUIRED
277 type (a|b|c) #REQUIRED "a"
278 >
279 <!ELEMENT album (#PCDATA)>
280 ]>
281 ` ,
282 no_delete => 1 ,
283 );
284
285 args()
286 Return the arguments names (not nodes).
287
288 args_values()
289 Return the arguments values (not nodes).
290
291 back()
292 Get back one level the pointer in the tree.
293
294 ** Se base().
295
296 base()
297 Get back to the base of the tree.
298
299 Each query to the XML::Smart object return an object pointing to a
300 different place in the tree (and share the same HASH tree). So, you can
301 get the main object again (an object that points to the base):
302
303 my $srv = $XML->{root}{host}{server} ;
304 my $addr = $srv->{adress} ;
305 my $XML2 = $srv->base() ;
306 $XML2->{root}{hosts}...
307
308 content()
309 Return the content of a node:
310
311 ## Data:
312 <foo>my content</foo>
313
314 ## Access:
315
316 my $content = $XML->{foo}->content ;
317 print "<<$content>>\n" ; ## show: <<my content>>
318
319 ## or just:
320 my $content = $XML->{foo} ;
321
322 Also can be used with multiple contents:
323
324 For this XML data:
325
326 <root>
327 content0
328 <tag1 arg="1"/>
329 content1
330 </root>
331
332 Getting all the content:
333
334 my $all_content = $XML->{root}->content ;
335 print "[$all_content]\n" ;
336
337 Output:
338
339 [
340 content0
341
342 content1
343 ]
344
345 Getting in parts:
346
347 my @contents = $XML->{root}->content ;
348 print "[@contents[0]]\n" ;
349 print "[@contents[1]]\n" ;
350
351 Output
352
353 [
354 content0
355 ]
356 [
357 content1
358 ]
359
360 Setting multiple contents:
361
362 $XML->{root}->content(0,"aaaaa") ;
363 $XML->{root}->content(1,"bbbbb") ;
364
365 Output now will be:
366
367 [aaaaa]
368 [bbbbb]
369
370 And now the XML data generated will be:
371
372 <root>aaaaa<tag1 arg="1"/>bbbbb</root>
373
374 copy()
375 Return a copy of the XML::Smart object (pointing to the base).
376
377 ** This is good when you want to keep 2 versions of the same XML tree
378 in the memory, since one object can't change the tree of the other!
379
380 WARNING: set_node(), set_cdata() and set_binary() changes are not
381 persistant over copy - Once you create a second copy these states are
382 lost.
383
384 b<warning:> do not copy after apply_dtd() unless you have checked for
385 dtd errors.
386
387 cut_root()
388 Cut the root key:
389
390 my $srv = $XML->{rootx}{host}{server} ;
391
392 ## Or if you don't know the root name:
393 $XML = $XML->cut_root() ;
394 my $srv = $XML->{host}{server} ;
395
396 ** Note that this will cut the root of the pointer in the tree. So, if
397 you are in some place that have more than one key (multiple roots), the
398 same object will be retuned without cut anything.
399
400 data (OPTIONS)
401 Return the data of the XML object (rebuilding it).
402
403 Options:
404
405 nodtd Do not add in the XML content the DTD applied by the method
406 apply_dtd().
407
408 noident If set to true the data isn't idented.
409
410 nospace If set to true the data isn't idented and doesn't have space
411 between the tags (unless the CONTENT have).
412
413 lowtag Make the tags lower case.
414
415 lowarg Make the arguments lower case.
416
417 upertag Make the tags uper case.
418
419 uperarg Make the arguments uper case.
420
421 length If set true, add the attribute 'length' with the size of the
422 data to the xml header (<?xml ...?>). This is useful when
423 you send the data through a socket, since the socket can
424 know the total amount of data to read.
425
426 noheader Do not add the <?xml ...?> header.
427
428 nometagen Do not add the meta generator tag: <?meta
429 generator="XML::Smart" ?>
430
431 meta Set the meta tags of the XML document.
432
433 decode As of VERSION 1.73 there are three different base64
434 encodings that are used. They are picked based on which of
435 them support the data provided. If you want to retrieve data
436 using the 'data' function the resultant xml will have
437 dt:dt="binary.based" contained within it. To retrieve the
438 decoded data use: $XML->data( decode => 1 )
439
440 Examples:
441
442 my $meta = {
443 build_from => "wxWindows 2.4.0" ,
444 file => "wx26.htm" ,
445 } ;
446
447 print $XML->data( meta => $meta ) ;
448
449 __DATA__
450 <?meta build_from="wxWindows 2.4.0" file="wx283.htm" ?>
451
452 Multiple meta:
453
454 my $meta = [
455 {build_from => "wxWindows 2.4.0" , file => "wx26.htm" } ,
456 {script => "genxml.pl" , ver => "1.0" } ,
457 ] ;
458
459 __DATA__
460 <?meta build_from="wxWindows 2.4.0" file="wx26.htm" ?>
461 <?meta script="genxml.pl" ver="1.0" ?>
462
463 Or set directly the meta tag:
464
465 my $meta = '<?meta foo="bar" ?>' ;
466
467 ## For multiple:
468 my $meta = ['<?meta foo="bar" ?>' , '<?meta x="1" ?>'] ;
469
470 print $XML->data( meta => $meta ) ;
471
472 tree Set the HASH tree to parse. If not set will use the tree of
473 the XML::Smart object (tree()). ;
474
475 wild Accept wild tags and arguments.
476
477 ** This wont fix wrong keys and tags.
478
479 sortall Sort all the tags alphabetically. If not set will keep the
480 order of the document loaded, or the order of tag creation.
481 Default: off
482
483 data_pointer (OPTIONS)
484 Make the tree from current point in the XML tree (not from the base as
485 data()).
486
487 Accept the same OPTIONS of the method data().
488
489 dump_tree()
490 Dump the tree of the object using Data::Dumper.
491
492 dump_tree_pointer()
493 Dump the tree of the object, from the pointer, using Data::Dumper.
494
495 dump_pointer()
496 ** Same as dump_tree_pointer().
497
498 i()
499 Return the index of the value.
500
501 ** If the value is from an hash key (not an ARRAY ref) undef is
502 returned.
503
504 is_node()
505 Return if a key is a node.
506
507 key()
508 Return the key of the value.
509
510 If wantarray return the index too: return(KEY , I) ;
511
512 nodes()
513 Return the nodes (objects) in the pointer (keys that aren't arguments).
514
515 nodes_keys()
516 Return the nodes names (not the object) in the pointer (keys that
517 aren't arguments).
518
519 null()
520 Return true if the XML object has a null tree or if the pointer is in
521 some place that doesn't exist.
522
523 order()
524 Return the order of the keys. See set_order().
525
526 path()
527 Return the path of the pointer.
528
529 Example:
530
531 /hosts/server[1]/address[0]
532
533 Note that the index is 0 based and 'address' can be an attribute or a
534 node, what is not compatible with XPath.
535
536 ** See path_as_xpath().
537
538 path_as_xpath()
539 Return the path of the pointer in the XPath format.
540
541 pointer
542 Return the HASH tree from the pointer.
543
544 pointer_ok
545 Return a copy of the tree of the object, from the pointer, but without
546 internal keys added by XML::Smart.
547
548 root
549 Return the ROOT name of the XML tree (main key).
550
551 ** See also key() for sub nodes.
552
553 save (FILEPATH , OPTIONS)
554 Save the XML data inside a file.
555
556 Accept the same OPTIONS of the method data().
557
558 set_auto
559 Define the key to be handled automatically. Soo, data() will define
560 automatically if it's a node, content or attribute.
561
562 ** This method is useful to remove set_node(), set_cdata() and
563 set_binary() changes.
564
565 set_auto_node
566 Define the key as a node, and data() will define automatically if it's
567 CDATA or BINARY.
568
569 ** This method is useful to remove set_cdata() and set_binary()
570 changes.
571
572 set_binary(BOOL)
573 Define the node as a BINARY content when TRUE, or force to not handle
574 it as a BINARY on FALSE.
575
576 Example of node handled as BINARY:
577
578 <root><foo dt:dt="binary.base64">PGgxPnRlc3QgAzwvaDE+</foo></root>
579
580 Original content of foo (the base64 data):
581
582 <h1>test \x03</h1>
583
584 set_cdata(BOOL)
585 Define the node as CDATA when TRUE, or force to not handle it as CDATA
586 on FALSE.
587
588 Example of CDATA node:
589
590 <root><foo><![CDATA[bla bla bla <tag> bla bla]]></foo></root>
591
592 set_node(BOOL)
593 Set/unset the current key as a node (tag).
594
595 ** If BOOL is not defined will use TRUE.
596
597 WARNING: You cannot set_node, copy the object and then set_node( 0 ) [
598 Unset node ]
599
600 set_order(KEYS)
601 Set the order of the keys (nodes and attributes) in this point.
602
603 set_tag
604 Same as set_node.
605
606 tree()
607 Return the HASH tree of the XML data.
608
609 ** Note that the real HASH tree is returned here. All the other ways
610 return an object that works like a HASH/ARRAY through tie.
611
612 tree_pointer()
613 Same as pointer().
614
615 tree_ok()
616 Return a copy of the tree of the object, but without internal keys
617 added by XML::Smart, like /order and /nodes.
618
619 tree_pointer_ok()
620 Return a copy of the tree of the object, from the pointer, but without
621 internal keys added by XML::Smart.
622
623 xpath() || XPath()
624 Return a XML::XPath object, based in the XML root in the tree.
625
626 ## look from the root:
627 my $data = $XML->XPath->findnodes_as_string('/') ;
628
629 ** Need XML::XPath installed, but only load when is needed.
630
631 xpath_pointer() || XPath_pointer()
632 Return a XML::XPath object, based in the XML::Smart pointer in the
633 tree.
634
635 ## look from this point, soo XPath '/' actually starts at /server/:
636
637 my $srvs = $XML->{server} ;
638 my $data = $srvs->XPath_pointer->findnodes_as_string('/') ;
639
640 ** Need XML::XPath installed, but only load when is needed.
641
642 ANNIHILATE
643 XML::Smart uses XML::XPath that, for perfomance reasons, leaks memory.
644 The ensure that this memory is freed you can explicitly call ANNIHILATE
645 before the XML::Smart object goes out of scope.
646
648 To access the data you use the object in a way similar to HASH and
649 ARRAY:
650
651 my $XML = XML::Smart->new('file.xml') ;
652
653 my $server = $XML->{server} ;
654
655 But when you get a key {server}, you are actually accessing the data
656 through tie(), not directly to the HASH tree inside the object, (This
657 will fix wrong accesses):
658
659 ## {server} is a normal key, not an ARRAY ref:
660
661 my $server = $XML->{server}[0] ; ## return $XML->{server}
662 my $server = $XML->{server}[1] ; ## return UNDEF
663
664 ## {server} has an ARRAY with 2 items:
665
666 my $server = $XML->{server} ; ## return $XML->{server}[0]
667 my $server = $XML->{server}[0] ; ## return $XML->{server}[0]
668 my $server = $XML->{server}[1] ; ## return $XML->{server}[1]
669
670 To get all the values of multiple elements/keys:
671
672 ## This work having only a string inside {address}, or with an ARRAY ref:
673 my @addrsses = @{$XML->{server}{address}} ;
674
675 Select search
676 When you don't know the position of the nodes, you can select it by
677 some attribute value:
678
679 my $server = $XML->{server}('type','eq','suse') ; ## return $XML->{server}[1]
680
681 Syntax for the select search:
682
683 (NAME, CONDITION , VALUE)
684
685 NAME The attribute name in the node (tag).
686
687 CONDITION Can be
688
689 eq ne == != <= >= < >
690
691 For REGEX:
692
693 =~ !~
694
695 ## Case insensitive:
696 =~i !~i
697
698 VALUE The value.
699
700 For REGEX use like this:
701
702 $XML->{server}('type','=~','^s\w+$') ;
703
704 Select attributes in multiple nodes:
705 You can get the list of values of an attribute looking in all multiple
706 nodes:
707
708 ## Get all the server types:
709 my @types = $XML->{server}('[@]','type') ;
710
711 Also as:
712
713 my @types = $XML->{server}{type}('<@') ;
714
715 Without the resource:
716
717 my @list ;
718 my @servers = @{$XML->{server}} ;
719
720 foreach my $servers_i ( @servers ) {
721 push(@list , $servers_i->{type} ) ;
722 }
723
724 Return format
725 You can change the returned format:
726
727 Syntax:
728
729 (TYPE)
730
731 Where TYPE can be:
732
733 $ ## the content.
734 @ ## an array (list of multiple values).
735 % ## a hash.
736 . ## The exact point in the tree, not an object.
737
738 $@ ## an array, but with the content, not an objects.
739 $% ## a hash, but the values are the content, not an object.
740
741 ## The use of $@ and $% is good if you don't want to keep the object
742 ## reference (and save memory).
743
744 @keys ## The keys of the node. note that if you have a key with
745 ## multiple nodes, it will be replicated (this is the
746 ## difference of "keys %{$this->{node}}" ).
747
748 <@ ## Return the attribute in the previous node, but looking for
749 ## multiple nodes. Example:
750
751 my @names = $this->{method}{wxFrame}{arg}{name}('<@') ;
752 #### @names = (parent , id , title) ;
753
754 <xml> ## Return a XML data from this point.
755
756 __DATA__
757 <method>
758 <wxFrame return="wxFrame">
759 <arg name="parent" type="wxWindow" />
760 <arg name="id" type="wxWindowID" />
761 <arg name="title" type="wxString" />
762 </wxFrame>
763 </method>
764
765 Example:
766
767 ## A servers content
768 my $name = $XML->{server}{name}('$') ;
769 ## ... or:
770 my $name = $XML->{server}{name}->content ;
771 ## ... or:
772 my $name = $XML->{server}{name} ;
773 $name = "$name" ;
774
775 ## All the servers
776 my @servers = $XML->{server}('@') ;
777 ## ... or:
778 my @servers = @{$XML->{server}} ;
779
780 ## It still has the object reference:
781 @servers[0]->{name} ;
782
783 ## Without the reference:
784 my @servers = $XML->{server}('$@') ;
785
786 ## A XML data, same as data_pointer():
787 my $xml_data = $XML->{server}('<xml>') ;
788
789 CONTENT
790 If a {key} has a content you can access it directly from the variable
791 or from the method:
792
793 my $server = $XML->{server} ;
794
795 print "Content: $server\n" ;
796 ## ...or...
797 print "Content: ". $server->content ."\n" ;
798
799 So, if you use the object as a string it works as a string, if you use
800 as an object it works as an object! ;-P
801
802 **See the method content() for more.
803
805 To create XML data is easy, you just use as a normal HASH, but you
806 don't need to care with multiple nodes, and ARRAY creation/convertion!
807
808 ## Create a null XML object:
809 my $XML = XML::Smart->new() ;
810
811 ## Add a server to the list:
812 $XML->{server} = {
813 os => 'Linux' ,
814 type => 'mandrake' ,
815 version => 8.9 ,
816 address => '192.168.3.201' ,
817 } ;
818
819 ## The data now:
820 <server address="192.168.3.201" os="Linux" type="mandrake" version="8.9"/>
821
822 ## Add a new address to the server. Have an ARRAY creation, convertion
823 ## of the previous key to ARRAY:
824 $XML->{server}{address}[1] = '192.168.3.202' ;
825
826 ## The data now:
827 <server os="Linux" type="mandrake" version="8.9">
828 <address>192.168.3.201</address>
829 <address>192.168.3.202</address>
830 </server>
831
832 After create your XML tree you just save it or get the data:
833
834 ## Get the data:
835 my $data = $XML->data ;
836
837 ## Or save it directly:
838 $XML->save('newfile.xml') ;
839
840 ## Or send to a socket:
841 print $socket $XML->data(length => 1) ;
842
844 From version 1.2 XML::Smart can handle binary data and CDATA blocks
845 automatically.
846
847 When parsing, binary data will be detected as:
848
849 <code dt:dt="binary.base64">f1NPTUUgQklOQVJZIERBVEE=</code>
850
851 Since this is the oficial automatically format for binary data at
852 XML.com <http://www.xml.com/pub/a/98/07/binary/binary.html>. The
853 content will be decoded from base64 and saved in the object tree.
854
855 CDATA will be parsed as any other content, since CDATA is only a block
856 that won't be parsed.
857
858 When creating XML data, like at $XML->data(), the binary format and
859 CDATA are detected using these rules:
860
861 BINARY:
862 - If your data has characters that can't be in XML.
863
864 * Characters accepted:
865
866 \s \w \d
867 !"#$%&'()*+,-./:;<=>?@[\]^`{|}~
868 0x82, 0x83, 0x84, 0x85, 0x86, 0x87, 0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8e, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96,
869 0x97, 0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9e, 0x9f, 0xa1, 0xa2, 0xa3, 0xa4, 0xa5, 0xa6, 0xa7, 0xa8, 0xa9, 0xaa,
870 0xab, 0xac, 0xad, 0xae, 0xaf, 0xb0, 0xb1, 0xb2, 0xb3, 0xb4, 0xb5, 0xb6, 0xb7, 0xb8, 0xb9, 0xba, 0xbb, 0xbc,
871 0xbd, 0xbe, 0xbf, 0xc0, 0xc1, 0xc2, 0xc3, 0xc4, 0xc5, 0xc6, 0xc7, 0xc8, 0xc9, 0xca, 0xcb, 0xcc, 0xcd, 0xce,
872 0xcf, 0xd0, 0xd1, 0xd2, 0xd3, 0xd4, 0xd5, 0xd6, 0xd7, 0xd8, 0xd9, 0xda, 0xdb, 0xdc, 0xdd, 0xde, 0xdf, 0xe0,
873 0xe1, 0xe2, 0xe3, 0xe4, 0xe5, 0xe6, 0xe7, 0xe8, 0xe9, 0xea, 0xeb, 0xec, 0xed, 0xee, 0xef, 0xf0, 0xf1, 0xf2,
874 0xf3, 0xf4, 0xf5, 0xf6, 0xf7, 0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0xfd, 0xfe, 0xff, 0x20
875
876 TODO: 0x80, 0x81, 0x8d, 0x8f, 0x90, 0xa0
877
878 CDATA:
879 - If have tags: <...>
880
881 CONTENT: (<tag>content</tag>)
882 - If have \r\n\t, or ' and " at the same time.
883
884 So, this will be a CDATA content:
885
886 <code><![CDATA[
887 line1
888 <tag_not_parsed>
889 line2
890 ]]></code>
891
892 If binary content is detected, it will be converted to base64 and a
893 dt:dt attribute added in the tag to tell the format.
894
895 <code dt:dt="binary.base64">f1NPTUUgQklOQVJZIERBVEE=</code>
896
897 NOTE: As of VERSION 1.73 there are three different base64 encodings
898 that are used. They are picked based on which of them support the data
899 provided. If you want to retrieve data using the 'data' function the
900 resultant xml will have dt:dt="binary.based" contained within it. To
901 retrieve the decoded data use: $XML->data( decode => 1 )
902
904 XML::Smart support only thse 2 encode types, Unicode (UTF-8) and ASCII-
905 extended (ISO-8859-1), and must be enough. (Note that UTF-8 is only
906 supported on Perl-5.8+).
907
908 When creating XML data, if any UTF-8 character is detected the encoding
909 attribute in the <?xml ...?> header will be set to UTF-8:
910
911 <?xml version="1.0" encoding="utf-8" ?>
912 <data>0x82, 0x83</data>
913
914 If not, the iso-8859-1 is used:
915
916 <?xml version="1.0" encoding="iso-8859-1" ?>
917 <data>0x82</data>
918
919 When loading XML data with UTF-8, Perl (5.8+) should make all the work
920 internally.
921
923 You can use the special parser XML::Smart::HTMLParser to "use" HTML as
924 XML or not well-formed XML data.
925
926 The differences between an normal XML parser and XML::Smart::HTMLParser
927 are:
928
929 - Accept values without quotes:
930 <foo bar=x>
931
932 - Accept any data in the values, including <> and &:
933 <root><echo sample="echo \"Hello!\">out.txt"></root>
934
935 - Accpet URI values without quotes:
936 <link url=http://www.foo.com/dir/file?query?q=v&x=y target=#_blank>
937
938 - Don't need to close the tags adding the '/' before '>':
939 <root><foo bar="1"></root>
940
941 ** Note that the parse will try hard to detect the nodes, and where
942 auto-close or not.
943
944 - Don't need to have only one root:
945 <foo>data</foo><bar>data</bar>
946
947 So, XML::Smart::HTMLParser is a willd way to load markuped data (like
948 HTML), or if you don't want to care with quotes, end tags, etc... when
949 writing by hand your XML data. So, you can write by hand a bad XML
950 file, load it with XML::Smart::HTMLParser, and rewrite well saving it
951 again! ;-P
952
953 ** Note that <SCRIPT> tags will only parse right if the content is
954 inside comments <!--...-->, since they can have tags:
955
956 <SCRIPT LANGUAGE="JavaScript"><!--
957 document.writeln("some <tag> in the string");
958 --></SCRIPT>
959
961 Entities (ENTITY) are handled by the parser. So, if you use XML::Parser
962 it will do all the job fine. But If you use XML::Smart::Parser or
963 XML::Smart::HMLParser, only the basic entities (defaults) will be
964 parsed:
965
966 < => The less than sign (<).
967 > => The greater than sign (>).
968 & => The ampersand (&).
969 ' => The single quote or apostrophe (').
970 " => The double quote (").
971
972 &#ddd; => An ASCII character or an Unicode character (>255). Where ddd is a decimal.
973 &#xHHH; => An Unicode character. Where HHH is in hexadecimal.
974
975 When creating XML data, already existent Entities won't be changed, and
976 the characters '<', '&' and '>' will be converted to the appropriated
977 entity.
978
979 ** Note that if a content have a <tag>, the characters '<' and '>'
980 won't be converted to entities, and this content will be inside a CDATA
981 block.
982
984 Every one that have tried to use Perl HASH and ARRAY to access XML
985 data, like in XML::Simple, have some problems to add new nodes, or to
986 access the node when the user doesn't know if it's inside an ARRAY, a
987 HASH or a HASH key. XML::Smart create around it a very dynamic way to
988 access the data, since at the same time any node/point in the tree can
989 be a HASH and an ARRAY. You also have other extra resources, like a
990 search for nodes by attribute:
991
992 my $server = $XML->{server}('type','eq','suse') ; ## This syntax is not wrong! ;-)
993
994 ## Instead of:
995 my $server = $XML->{server}[1] ;
996
997 __DATA__
998 <hosts>
999 <server os="linux" type="redhat" version="8.0">
1000 <server os="linux" type="suse" version="7.0">
1001 </hosts>
1002
1003 The idea for this module, came from the problem that exists to access a
1004 complex struture in XML. You just need to know how is this structure,
1005 something that is generally made looking the XML file (what is wrong).
1006 But at the same time is hard to always check (by code) the struture,
1007 before access it. XML is a good and easy format to declare your data,
1008 but to extrac it in a tree way, at least in my opinion, isn't easy. To
1009 fix that, came to my mind a way to access the data with some query
1010 language, like SQL. The first idea was to access using something like:
1011
1012 XML.foo.bar.baz{arg1}
1013
1014 X = XML.foo.bar*
1015 X.baz{arg1}
1016
1017 XML.hosts.server[0]{argx}
1018
1019 And saw that this is very similar to Hashes and Arrays in Perl:
1020
1021 $XML->{foo}{bar}{baz}{arg1} ;
1022
1023 $X = $XML->{foo}{bar} ;
1024 $X->{baz}{arg1} ;
1025
1026 $XML->{hosts}{server}[0]{argx} ;
1027
1028 But the problem of Hash and Array, is not knowing when you have an
1029 Array reference or not. For example, in XML::Simple:
1030
1031 ## This is very diffenrent
1032 $XML->{server}{address} ;
1033 ## ... of this:
1034 $XML->{server}{address}[0] ;
1035
1036 So, why don't make both ways work? Because you need to make something
1037 crazy!
1038
1039 To create XML::Smart, first I have created the module
1040 Object::MultiType. With it you can have an object that works at the
1041 same time as a HASH, ARRAY, SCALAR, CODE & GLOB. So you can do things
1042 like this with the same object:
1043
1044 $obj = Object::MultiType->new() ;
1045
1046 $obj->{key} ;
1047 $obj->[0] ;
1048 $obj->method ;
1049
1050 @l = @{$obj} ;
1051 %h = %{$obj} ;
1052
1053 &$obj(args) ;
1054
1055 print $obj "send data\n" ;
1056
1057 Seems to be crazy, and can be more if you use tie() inside it, and this
1058 is what XML::Smart does.
1059
1060 For XML::Smart, the access in the Hash and Array way paste through
1061 tie(). In other words, you have a tied HASH and tied ARRAY inside it.
1062 This tied Hash and Array work together, soo you can access a Hash key
1063 as the index 0 of an Array, or access an index 0 as the Hash key:
1064
1065 %hash = (
1066 key => ['a','b','c']
1067 ) ;
1068
1069 $hash->{key} ## return $hash{key}[0]
1070 $hash->{key}[0] ## return $hash{key}[0]
1071 $hash->{key}[1] ## return $hash{key}[1]
1072
1073 ## Inverse:
1074
1075 %hash = ( key => 'a' ) ;
1076
1077 $hash->{key} ## return $hash{key}
1078 $hash->{key}[0] ## return $hash{key}
1079 $hash->{key}[1] ## return undef
1080
1081 The best thing of this new resource is to avoid wrong access to the
1082 data and warnings when you try to access a Hash having an Array (and
1083 the inverse). Thing that generally make the script die().
1084
1085 Once having an easy access to the data, you can use the same resource
1086 to create data! For example:
1087
1088 ## Previous data:
1089 <hosts>
1090 <server address="192.168.2.100" os="linux" type="conectiva" version="9.0"/>
1091 </hosts>
1092
1093 ## Now you have {address} as a normal key with a string inside:
1094 $XML->{hosts}{server}{address}
1095
1096 ## And to add a new address, the key {address} need to be an ARRAY ref!
1097 ## So, XML::Smart make the convertion: ;-P
1098 $XML->{hosts}{server}{address}[1] = '192.168.2.101' ;
1099
1100 ## Adding to a list that you don't know the size:
1101 push(@{$XML->{hosts}{server}{address}} , '192.168.2.102') ;
1102
1103 ## The data now:
1104 <hosts>
1105 <server os="linux" type="conectiva" version="9.0"/>
1106 <address>192.168.2.100</address>
1107 <address>192.168.2.101</address>
1108 <address>192.168.2.102</address>
1109 </server>
1110 </hosts>
1111
1112 Than after changing your XML tree using the Hash and Array resources
1113 you just get the data remade (through the Hash tree inside the object):
1114
1115 my $xmldata = $XML->data ;
1116
1117 But note that XML::Smart always return an object! Even when you get a
1118 final key. So this actually returns another object, pointhing (inside
1119 it) to the key:
1120
1121 $addr = $XML->{hosts}{server}{address}[0] ;
1122
1123 ## Since $addr is an object you can TRY to access more data:
1124 $addr->{foo}{bar} ; ## This doens't make warnings! just return UNDEF.
1125
1126 ## But you can use it like a normal SCALAR too:
1127
1128 print "$addr\n" ;
1129
1130 $addr .= ':80' ; ## After this $addr isn't an object any more, just a SCALAR!
1131
1133 * Finish XPath implementation.
1134 * DTD - Handle <!DOCTYPE> gracefully.
1135 * Implement a better way to declare meta tags.
1136 * Add 0x80, 0x81, 0x8d, 0x8f, 0x90, 0xa0 ( multi byte characters to the list of accepted binary characters )
1137 * Ensure object copy holds more in state including: ->data( wild => 1 )
1138
1140 XML::Parser, XML::Parser::Lite, XML::XPath, XML.
1141
1142 Object::MultiType - This is the module that make everything possible,
1143 and was created specially for XML::Smart. ;-P
1144
1145 ** See the test.pl script for examples of use.
1146
1147 XML.com <http://www.xml.com>
1148
1150 Graciliano M. P. "<gm at virtuasites.com.br>"
1151
1152 I will appreciate any type of feedback (include your opinions and/or
1153 suggestions). ;-P
1154
1155 Enjoy and thanks for who are enjoying this tool and have sent e-mails!
1156 ;-P
1157
1159 Harish Madabushi, "<harish.tmh at gmail.com>"
1160
1162 Please report any bugs or feature requests to "bug-xml-smart at
1163 rt.cpan.org", or through the web interface at
1164 <http://rt.cpan.org/NoAuth/ReportBug.html?Queue=XML-Smart>. Both the
1165 author and the maintainer will be notified, and then you'll
1166 automatically be notified of progress on your bug as changes are made.
1167
1169 You can find documentation for this module with the perldoc command.
1170
1171 perldoc XML::Smart
1172
1173 You can also look for information at:
1174
1175 • RT: CPAN's request tracker (report bugs here)
1176
1177 <http://rt.cpan.org/NoAuth/Bugs.html?Dist=XML-Smart>
1178
1179 • AnnoCPAN: Annotated CPAN documentation
1180
1181 <http://annocpan.org/dist/XML-Smart>
1182
1183 • CPAN Ratings
1184
1185 <http://cpanratings.perl.org/d/XML-Smart>
1186
1187 • Search CPAN
1188
1189 <http://search.cpan.org/dist/XML-Smart/>
1190
1191 • GitHub CPAN
1192
1193 <https://github.com/harishmadabushi/XML-Smart>
1194
1196 Thanks to Rusty Allen for the extensive tests of CDATA and BINARY
1197 handling of XML::Smart.
1198
1199 Thanks to Ted Haining to point a Perl-5.8.0 bug for tied keys of a
1200 HASH.
1201
1202 Thanks to everybody that have sent ideas, patches or pointed bugs.
1203
1205 Copyright 2003 Graciliano M. P.
1206
1207 This program is free software; you can redistribute it and/or modify it
1208 under the same terms as Perl itself.
1209
1210
1211
1212perl v5.36.0 2023-01-20 XML::Smart(3)