1XML::Tiny(3) User Contributed Perl Documentation XML::Tiny(3)
2
3
4
6 XML::Tiny - simple lightweight parser for a subset of XML
7
9 XML::Tiny is a simple lightweight parser for a subset of XML
10
12 use XML::Tiny qw(parsefile);
13 open($xmlfile, 'something.xml);
14 my $document = parsefile($xmlfile);
15
16 This will leave $document looking something like this:
17
18 [
19 {
20 type => 'e',
21 attrib => { ... },
22 name => 'rootelementname',
23 content => [
24 ...
25 more elements and text content
26 ...
27 ]
28 }
29 ]
30
32 The "parsefile" function is optionally exported. By default nothing is
33 exported. There is no objecty interface.
34
35 parsefile
36 This takes at least one parameter, optionally more. The compulsory
37 parameter may be:
38
39 a filename
40 in which case the file is read and parsed;
41
42 a string of XML
43 in which case it is read and parsed. How do we tell if we've got a
44 string or a filename? If it begins with "_TINY_XML_STRING_" then
45 it's a string. That prefix is, of course, ignored when it comes to
46 actually parsing the data. This is intended primarily for use by
47 wrappers which want to retain compatibility with Ye Aunciente Perl.
48 Normal users who want to pass in a string would be expected to use
49 IO::Scalar.
50
51 a glob-ref or IO::Handle object
52 in which case again, the file is read and parsed.
53
54 The former case is for compatibility with older perls, but makes no
55 attempt to properly deal with character sets. If you open a file in a
56 character-set-friendly way and then pass in a handle / object, then the
57 method should Do The Right Thing as it only ever works with character
58 data.
59
60 The remaining parameters are a list of key/value pairs to make a hash
61 of options:
62
63 fatal_declarations
64 If set to true, <!ENTITY...> and <!DOCTYPE...> declarations in the
65 document are fatal errors - otherwise they are *ignored*.
66
67 no_entity_parsing
68 If set to true, the five built-in entities are passed through
69 unparsed. Note that special characters in CDATA and attributes may
70 have been turned into "&", "<" and friends.
71
72 strict_entity_parsing
73 If set to true, any unrecognised entities (ie, those outside the
74 core five plus numeric entities) cause a fatal error. If you set
75 both this and "no_entity_parsing" (but why would you do that?) then
76 the latter takes precedence.
77
78 Obviously, if you want to maximise compliance with the XML spec,
79 you should turn on fatal_declarations and strict_entity_parsing.
80
81 The function returns a structure describing the document. This
82 contains one or more nodes, each being either an 'element' node or a
83 'text' mode. The structure is an arrayref which contains a single
84 'element' node which represents the document entity. The arrayref is
85 redundant, but exists for compatibility with XML::Parser::EasyTree.
86
87 Element nodes are hashrefs with the following keys:
88
89 type
90 The node's type, represented by the letter 'e'.
91
92 name
93 The element's name.
94
95 attrib
96 A hashref containing the element's attributes, as key/value pairs
97 where the key is the attribute name.
98
99 content
100 An arrayref of the element's contents. The array's contents is a
101 list of nodes, in the order they were encountered in the document.
102
103 Text nodes are hashrefs with the following keys:
104
105 type
106 The node's type, represented by the letter 't'.
107
108 content
109 A scalar piece of text.
110
111 If you prefer a DOMmish interface, then look at XML::Tiny::DOM on the
112 CPAN.
113
115 With other modules
116 The "parsefile" function is so named because it is intended to work in
117 a similar fashion to XML::Parser with the XML::Parser::EasyTree style.
118 Instead of saying this:
119
120 use XML::Parser;
121 use XML::Parser::EasyTree;
122 $XML::Parser::EasyTree::Noempty=1;
123 my $p=new XML::Parser(Style=>'EasyTree');
124 my $tree=$p->parsefile('something.xml');
125
126 you would say:
127
128 use XML::Tiny;
129 my $tree = XML::Tiny::parsefile('something.xml');
130
131 Any valid document that can be parsed like that using XML::Tiny should
132 produce identical results if you use the above example of how to use
133 XML::Parser::EasyTree.
134
135 If you find a document where that is not the case, please report it as
136 a bug.
137
138 With perl 5.004
139 The module is intended to be fully compatible with every version of
140 perl back to and including 5.004, and may be compatible with even older
141 versions of perl 5.
142
143 The lack of Unicode and friends in older perls means that XML::Tiny
144 does nothing with character sets. If you have a document with a funny
145 character set, then you will need to open the file in an appropriate
146 mode using a character-set-friendly perl and pass the resulting file
147 handle to the module. BOMs are ignored.
148
149 The subset of XML that we understand
150 Element tags and attributes
151 Including "self-closing" tags like <pie type = 'steak n kidney' />;
152
153 Comments
154 Which are ignored;
155
156 The five "core" entities
157 ie "&", "<", ">", "'" and """;
158
159 Numeric entities
160 eg "A" and "A";
161
162 CDATA
163 This is simply turned into PCDATA before parsing. Note how this
164 may interact with the various entity-handling options;
165
166 The following parts of the XML standard are handled incorrectly or not
167 at all - this is not an exhaustive list:
168
169 Namespaces
170 While documents that use namespaces will be parsed just fine,
171 there's no special treatment of them. Their names are preserved in
172 element and attribute names like 'rdf:RDF'.
173
174 DTDs and Schemas
175 This is not a validating parser. <!DOCTYPE...> declarations are
176 ignored if you've not made them fatal.
177
178 Entities and references
179 <!ENTITY...> declarations are ignored if you've not made them
180 fatal. Unrecognised entities are ignored by default, as are naked
181 & characters. This means that if entity parsing is enabled you
182 won't be able to tell the difference between "&nbsp;" and
183 " ". If your document might use any non-core entities then
184 please consider using the "no_entity_parsing" option, and then use
185 something like HTML::Entities.
186
187 Processing instructions
188 These are ignored.
189
190 Whitespace
191 We do not guarantee to correctly handle leading and trailing
192 whitespace.
193
194 Character sets
195 This is not practical with older versions of perl
196
198 While feedback from real users about this module has been uniformly
199 positive and helpful, some people seem to take issue with this module
200 because it doesn't implement every last jot and tittle of the XML
201 standard and merely implements a useful subset. A very useful subset,
202 as it happens, which can cope with common light-weight XML-ish tasks
203 such as parsing the results of queries to the Amazon Web Services.
204 Many, perhaps most, users of XML do not in fact need a full
205 implementation of the standard, and are understandably reluctant to
206 install large complex pieces of software which have many dependencies.
207 In fact, when they realise what installing and using a full
208 implementation entails, they quite often don't *want* it. Another
209 class of users, people distributing applications, often can not rely on
210 users being able to install modules from the CPAN, or even having tools
211 like make or a shell available. XML::Tiny exists for those people.
212
214 I welcome feedback about my code, including constructive criticism.
215 Bug reports should be made using <http://rt.cpan.org/> or by email, and
216 should include the smallest possible chunk of code, along with any
217 necessary XML data, which demonstrates the bug. Ideally, this will be
218 in the form of a file which I can drop in to the module's test suite.
219 Please note that such files must work in perl 5.004.
220
222 For more capable XML parsers:
223 XML::Parser
224
225 XML::Parser::EasyTree
226
227 XML::Tiny::DOM
228
229 The requirements for a module to be Tiny
230 <http://beta.nntp.perl.org/group/perl.datetime/2007/01/msg6584.html>
231
233 David Cantrell <david@cantrell.org.uk>
234
235 Thanks to David Romano for some compatibility patches for Ye Aunciente
236 Perl;
237
238 to Matt Knecht and David Romano for prodding me to support attributes,
239 and to Matt for providing code to implement it in a quick n dirty
240 minimal kind of way;
241
242 to the people on <http://use.perl.org/> and elsewhere who have been
243 kind enough to point out ways it could be improved;
244
245 to Sergio Fanchiotti for pointing out a bug in handling self-closing
246 tags, for reporting another bug that I introduced when fixing the first
247 one, and for providing a patch to improve error reporting;
248
249 to 'Corion' for finding a bug with localised filehandles and providing
250 a fix;
251
252 to Diab Jerius for spotting that element and attribute names can begin
253 with an underscore;
254
255 to Nick Dumas for finding a bug when attribs have their quoting
256 character in CDATA, and providing a patch;
257
258 to Mathieu Longtin for pointing out that BOMs exist.
259
260 Copyright 2007-2010 David Cantrell <david@cantrell.org.uk>
261
262 This software is free-as-in-speech software, and may be used,
263 distributed, and modified under the terms of either the GNU General
264 Public Licence version 2 or the Artistic Licence. It's up to you which
265 one you use. The full text of the licences can be found in the files
266 GPL2.txt and ARTISTIC.txt, respectively.
267
269 This module is also free-as-in-mason software.
270
271
272
273perl v5.36.0 2023-01-20 XML::Tiny(3)