XML::Tiny(3pm)

1XML::Tiny(3)          User Contributed Perl Documentation         XML::Tiny(3)
2
3
4

NAME

6       XML::Tiny - simple lightweight parser for a subset of XML
7

DESCRIPTION

9       XML::Tiny is a simple lightweight parser for a subset of XML
10

SYNOPSIS

12           use XML::Tiny qw(parsefile);
13           open($xmlfile, 'something.xml);
14           my $document = parsefile($xmlfile);
15
16       This will leave $document looking something like this:
17
18           [
19               {
20                   type   => 'e',
21                   attrib => { ... },
22                   name   => 'rootelementname',
23                   content => [
24                       ...
25                       more elements and text content
26                       ...
27                  ]
28               }
29           ]
30

FUNCTIONS

32       The "parsefile" function is optionally exported.  By default nothing is
33       exported.  There is no objecty interface.
34
35   parsefile
36       This takes at least one parameter, optionally more.  The compulsory
37       parameter may be:
38
39       a filename
40           in which case the file is read and parsed;
41
42       a string of XML
43           in which case it is read and parsed.  How do we tell if we've got a
44           string or a filename?  If it begins with "_TINY_XML_STRING_" then
45           it's a string.  That prefix is, of course, ignored when it comes to
46           actually parsing the data.  This is intended primarily for use by
47           wrappers which want to retain compatibility with Ye Aunciente Perl.
48           Normal users who want to pass in a string would be expected to use
49           IO::Scalar.
50
51       a glob-ref or IO::Handle object
52           in which case again, the file is read and parsed.
53
54       The former case is for compatibility with older perls, but makes no
55       attempt to properly deal with character sets.  If you open a file in a
56       character-set-friendly way and then pass in a handle / object, then the
57       method should Do The Right Thing as it only ever works with character
58       data.
59
60       The remaining parameters are a list of key/value pairs to make a hash
61       of options:
62
63       fatal_declarations
64           If set to true, <!ENTITY...> and <!DOCTYPE...> declarations in the
65           document are fatal errors - otherwise they are *ignored*.
66
67       no_entity_parsing
68           If set to true, the five built-in entities are passed through
69           unparsed.  Note that special characters in CDATA and attributes may
70           have been turned into "&amp;", "&lt;" and friends.
71
72       strict_entity_parsing
73           If set to true, any unrecognised entities (ie, those outside the
74           core five plus numeric entities) cause a fatal error.  If you set
75           both this and "no_entity_parsing" (but why would you do that?) then
76           the latter takes precedence.
77
78           Obviously, if you want to maximise compliance with the XML spec,
79           you should turn on fatal_declarations and strict_entity_parsing.
80
81       The function returns a structure describing the document.  This
82       contains one or more nodes, each being either an 'element' node or a
83       'text' mode.  The structure is an arrayref which contains a single
84       'element' node which represents the document entity.  The arrayref is
85       redundant, but exists for compatibility with XML::Parser::EasyTree.
86
87       Element nodes are hashrefs with the following keys:
88
89       type
90           The node's type, represented by the letter 'e'.
91
92       name
93           The element's name.
94
95       attrib
96           A hashref containing the element's attributes, as key/value pairs
97           where the key is the attribute name.
98
99       content
100           An arrayref of the element's contents.  The array's contents is a
101           list of nodes, in the order they were encountered in the document.
102
103       Text nodes are hashrefs with the following keys:
104
105       type
106           The node's type, represented by the letter 't'.
107
108       content
109           A scalar piece of text.
110
111       If you prefer a DOMmish interface, then look at XML::Tiny::DOM on the
112       CPAN.
113

COMPATIBILITY

115   With other modules
116       The "parsefile" function is so named because it is intended to work in
117       a similar fashion to XML::Parser with the XML::Parser::EasyTree style.
118       Instead of saying this:
119
120         use XML::Parser;
121         use XML::Parser::EasyTree;
122         $XML::Parser::EasyTree::Noempty=1;
123         my $p=new XML::Parser(Style=>'EasyTree');
124         my $tree=$p->parsefile('something.xml');
125
126       you would say:
127
128         use XML::Tiny;
129         my $tree = XML::Tiny::parsefile('something.xml');
130
131       Any valid document that can be parsed like that using XML::Tiny should
132       produce identical results if you use the above example of how to use
133       XML::Parser::EasyTree.
134
135       If you find a document where that is not the case, please report it as
136       a bug.
137
138   With perl 5.004
139       The module is intended to be fully compatible with every version of
140       perl back to and including 5.004, and may be compatible with even older
141       versions of perl 5.
142
143       The lack of Unicode and friends in older perls means that XML::Tiny
144       does nothing with character sets.  If you have a document with a funny
145       character set, then you will need to open the file in an appropriate
146       mode using a character-set-friendly perl and pass the resulting file
147       handle to the module.  BOMs are ignored.
148
149   The subset of XML that we understand
150       Element tags and attributes
151           Including "self-closing" tags like <pie type = 'steak n kidney' />;
152
153       Comments
154           Which are ignored;
155
156       The five "core" entities
157           ie "&amp;", "&lt;", "&gt;", "&apos;" and "&quot;";
158
159       Numeric entities
160           eg "&#65;" and "&#x41;";
161
162       CDATA
163           This is simply turned into PCDATA before parsing.  Note how this
164           may interact with the various entity-handling options;
165
166       The following parts of the XML standard are handled incorrectly or not
167       at all - this is not an exhaustive list:
168
169       Namespaces
170           While documents that use namespaces will be parsed just fine,
171           there's no special treatment of them.  Their names are preserved in
172           element and attribute names like 'rdf:RDF'.
173
174       DTDs and Schemas
175           This is not a validating parser.  <!DOCTYPE...> declarations are
176           ignored if you've not made them fatal.
177
178       Entities and references
179           <!ENTITY...> declarations are ignored if you've not made them
180           fatal.  Unrecognised entities are ignored by default, as are naked
181           & characters.  This means that if entity parsing is enabled you
182           won't be able to tell the difference between "&amp;nbsp;" and
183           "&nbsp;".  If your document might use any non-core entities then
184           please consider using the "no_entity_parsing" option, and then use
185           something like HTML::Entities.
186
187       Processing instructions
188           These are ignored.
189
190       Whitespace
191           We do not guarantee to correctly handle leading and trailing
192           whitespace.
193
194       Character sets
195           This is not practical with older versions of perl
196

PHILOSOPHY and JUSTIFICATION

198       While feedback from real users about this module has been uniformly
199       positive and helpful, some people seem to take issue with this module
200       because it doesn't implement every last jot and tittle of the XML
201       standard and merely implements a useful subset.  A very useful subset,
202       as it happens, which can cope with common light-weight XML-ish tasks
203       such as parsing the results of queries to the Amazon Web Services.
204       Many, perhaps most, users of XML do not in fact need a full
205       implementation of the standard, and are understandably reluctant to
206       install large complex pieces of software which have many dependencies.
207       In fact, when they realise what installing and using a full
208       implementation entails, they quite often don't *want* it.  Another
209       class of users, people distributing applications, often can not rely on
210       users being able to install modules from the CPAN, or even having tools
211       like make or a shell available.  XML::Tiny exists for those people.
212

BUGS and FEEDBACK

214       I welcome feedback about my code, including constructive criticism.
215       Bug reports should be made using <http://rt.cpan.org/> or by email, and
216       should include the smallest possible chunk of code, along with any
217       necessary XML data, which demonstrates the bug.  Ideally, this will be
218       in the form of a file which I can drop in to the module's test suite.
219       Please note that such files must work in perl 5.004.
220

AUTHOR, COPYRIGHT and LICENCE

233       David Cantrell <david@cantrell.org.uk>
234
235       Thanks to David Romano for some compatibility patches for Ye Aunciente
236       Perl;
237
238       to Matt Knecht and David Romano for prodding me to support attributes,
239       and to Matt for providing code to implement it in a quick n dirty
240       minimal kind of way;
241
242       to the people on <http://use.perl.org/> and elsewhere who have been
243       kind enough to point out ways it could be improved;
244
245       to Sergio Fanchiotti for pointing out a bug in handling self-closing
246       tags, for reporting another bug that I introduced when fixing the first
247       one, and for providing a patch to improve error reporting;
248
249       to 'Corion' for finding a bug with localised filehandles and providing
250       a fix;
251
252       to Diab Jerius for spotting that element and attribute names can begin
253       with an underscore;
254
255       to Nick Dumas for finding a bug when attribs have their quoting
256       character in CDATA, and providing a patch;
257
258       to Mathieu Longtin for pointing out that BOMs exist.
259
260       Copyright 2007-2010 David Cantrell <david@cantrell.org.uk>
261
262       This software is free-as-in-speech software, and may be used,
263       distributed, and modified under the terms of either the GNU General
264       Public Licence version 2 or the Artistic Licence.  It's up to you which
265       one you use.  The full text of the licences can be found in the files
266       GPL2.txt and ARTISTIC.txt, respectively.
267

CONSPIRACY

269       This module is also free-as-in-mason software.
270
271
272
273perl v5.34.0                      2022-01-21                      XML::Tiny(3)