1LibXML(3) User Contributed Perl Documentation LibXML(3)
2
3
4
6 XML::LibXML - Perl Binding for libxml2
7
9 use XML::LibXML;
10 my $dom = XML::LibXML->load_xml(string => <<'EOT');
11 <some-xml/>
12 EOT
13
14 $Version_String = XML::LibXML::LIBXML_DOTTED_VERSION;
15 $Version_ID = XML::LibXML::LIBXML_VERSION;
16 $DLL_Version = XML::LibXML::LIBXML_RUNTIME_VERSION;
17 $libxmlnode = XML::LibXML->import_GDOME( $node, $deep );
18 $gdomenode = XML::LibXML->export_GDOME( $node, $deep );
19
21 This module is an interface to libxml2, providing XML and HTML parsers
22 with DOM, SAX and XMLReader interfaces, a large subset of DOM Layer 3
23 interface and a XML::XPath-like interface to XPath API of libxml2. The
24 module is split into several packages which are not described in this
25 section; unless stated otherwise, you only need to "use XML::LibXML;"
26 in your programs.
27
28 For further information, please check the following documentation:
29
30 XML::LibXML::Parser
31 Parsing XML files with XML::LibXML
32
33 XML::LibXML::DOM
34 XML::LibXML Document Object Model (DOM) Implementation
35
36 XML::LibXML::SAX
37 XML::LibXML direct SAX parser
38
39 XML::LibXML::Reader
40 Reading XML with a pull-parser
41
42 XML::LibXML::Dtd
43 XML::LibXML frontend for DTD validation
44
45 XML::LibXML::RelaxNG
46 XML::LibXML frontend for RelaxNG schema validation
47
48 XML::LibXML::Schema
49 XML::LibXML frontend for W3C Schema schema validation
50
51 XML::LibXML::XPathContext
52 API for evaluating XPath expressions with enhanced support for the
53 evaluation context
54
55 XML::LibXML::InputCallback
56 Implementing custom URI Resolver and input callbacks
57
58 XML::LibXML::Common
59 Common functions for XML::LibXML related Classes
60
61 The nodes in the Document Object Model (DOM) are represented by the
62 following classes (most of which "inherit" from XML::LibXML::Node):
63
64 XML::LibXML::Document
65 XML::LibXML class for DOM document nodes
66
67 XML::LibXML::Node
68 Abstract base class for XML::LibXML DOM nodes
69
70 XML::LibXML::Element
71 XML::LibXML class for DOM element nodes
72
73 XML::LibXML::Text
74 XML::LibXML class for DOM text nodes
75
76 XML::LibXML::Comment
77 XML::LibXML class for comment DOM nodes
78
79 XML::LibXML::CDATASection
80 XML::LibXML class for DOM CDATA sections
81
82 XML::LibXML::Attr
83 XML::LibXML DOM attribute class
84
85 XML::LibXML::DocumentFragment
86 XML::LibXML's DOM L2 Document Fragment implementation
87
88 XML::LibXML::Namespace
89 XML::LibXML DOM namespace nodes
90
91 XML::LibXML::PI
92 XML::LibXML DOM processing instruction nodes
93
95 Recall that since version 5.6.1, Perl distinguishes between character
96 strings (internally encoded in UTF-8) and so called binary data and,
97 accordingly, applies either character or byte semantics to them. A
98 scalar representing a character string is distinguished from a byte
99 string by special flag (UTF8). Please refer to perlunicode for
100 details.
101
102 XML::LibXML's API is designed to deal with many encodings of XML
103 documents completely transparently, so that the application using
104 XML::LibXML can be completely ignorant about the encoding of the XML
105 documents it works with. On the other hand, functions like
106 "XML::LibXML::Document->setEncoding" give the user control over the
107 document encoding.
108
109 To ensure the aforementioned transparency and uniformity, most
110 functions of XML::LibXML that work with in-memory trees accept and
111 return data as character strings (i.e. UTF-8 encoded with the UTF8 flag
112 on) regardless of the original document encoding; however, the
113 functions related to I/O operations (i.e. parsing and saving) operate
114 with binary data (in the original document encoding) obeying the
115 encoding declaration of the XML documents.
116
117 Below we summarize basic rules and principles regarding encoding:
118
119 1. Do NOT apply any encoding-related PerlIO layers (":utf8" or
120 ":encoding(...)") to file handles that are an input for the parses
121 or an output for a serializer of (full) XML documents. This is
122 because the conversion of the data to/from the internal character
123 representation is provided by libxml2 itself which must be able to
124 enforce the encoding specified by the "<?xml version="1.0"
125 encoding="..."?>" declaration. Here is an example to follow:
126
127 use XML::LibXML;
128 open my $fh, "file.xml";
129 binmode $fh; # drop all PerlIO layers possibly created by a use open pragma
130 $doc = XML::LibXML->load_xml(IO => $fh);
131 open my $out, "out.xml";
132 binmode $fh; # as above
133 $doc->toFh($fh);
134 # or
135 print $fh $doc->toString();
136
137 2. All functions working with DOM accept and return character strings
138 (UTF-8 encoded with UTF8 flag on). E.g.
139
140 my $doc = XML::LibXML:Document->new('1.0',$some_encoding);
141 my $element = $doc->createElement($name);
142 $element->appendText($text);
143 $xml_fragment = $element->toString(); # returns a character string
144 $xml_document = $doc->toString(); # returns a byte string
145
146 where $some_encoding is the document encoding that will be used
147 when saving the document, and $name and $text contain character
148 strings (UTF-8 encoded with UTF8 flag on). Note that the method
149 "toString" returns XML as a character string if applied to other
150 node than the Document node and a byte string containing the
151 apropriate
152
153 <?xml version="1.0" encoding="..."?>
154
155 declaration if applied to a XML::LibXML::Document.
156
157 3. DOM methods also accept binary strings in the original encoding of
158 the document to which the node belongs (UTF-8 is assumed if the
159 node is not attached to any document). Exploiting this feature is
160 NOT RECOMMENDED since it is considered a bad practice.
161
162 my $doc = XML::LibXML:Document->new('1.0','iso-8859-2');
163 my $text = $doc->createTextNode($some_latin2_encoded_byte_string);
164 # WORKS, BUT NOT RECOMMENDED!
165
166 NOTE: libxml2 support for many encodings is based on the iconv library.
167 The actual list of supported encodings may vary from platform to
168 platform. To test if your platform works correctly with your language
169 encoding, build a simple document in the particular encoding and try to
170 parse it with XML::LibXML to see if the parser produces any errors.
171 Occasional crashes were reported on rare platforms that ship with a
172 broken version of iconv.
173
175 XML::LibXML since 1.67 partially supports Perl threads in Perl >=
176 5.8.8. XML::LibXML can be used with threads in two ways:
177
178 By default, all XML::LibXML classes use CLONE_SKIP class method to
179 prevent Perl from copying XML::LibXML::* objects when a new thread is
180 spawn. In this mode, all XML::LibXML::* objects are thread specific.
181 This is the safest way to work with XML::LibXML in threads.
182
183 Alternatively, one may use
184
185 use threads;
186 use XML::LibXML qw(:threads_shared);
187
188 to indicate, that all XML::LibXML node and parser objects should be
189 shared between the main thread and any thread spawn from there. For
190 example, in
191
192 my $doc = XML::LibXML->load_xml(location => $filename);
193 my $thr = threads->new(sub{
194 # code working with $doc
195 1;
196 });
197 $thr->join;
198
199 the variable $doc refers to the exact same XML::LibXML::Document in the
200 spawned thread as in the main thread.
201
202 Without using mutex locks, oaralel threads may read the same document
203 (i.e. any node that belongs to the document), parse files, and modify
204 different documents.
205
206 However, if there is a chance that some of the threads will attempt to
207 modify a document ( or even create new nodes based on that document,
208 e.g. with "$doc->createElement") that other threads may be reading at
209 the same time, the user is responsible for creating a mutex lock and
210 using it in both in the thread that modifies and the thread that reads:
211
212 my $doc = XML::LibXML->load_xml(location => $filename);
213 my $mutex : shared;
214 my $thr = threads->new(sub{
215 lock $mutex;
216 my $el = $doc->createElement('foo');
217 # ...
218 1;
219 });
220 {
221 lock $mutex;
222 my $root = $doc->documentElement;
223 say $root->name;
224 }
225 $thr->join;
226
227 Note that libxml2 uses dictionaries to store short strings and these
228 dicionaries are kept on a document node. Without mutex locks, it could
229 happen in the previous example that the thread modifies the dictionary
230 while other threads attempt to read from it, which could easily lead to
231 a crash.
232
234 Sometimes it is useful to figure out, for which version XML::LibXML was
235 compiled for. In most cases this is for debugging or to check if a
236 given installation meets all functionality for the package. The
237 functions XML::LibXML::LIBXML_DOTTED_VERSION and
238 XML::LibXML::LIBXML_VERSION provide this version information. Both
239 functions simply pass through the values of the similar named macros of
240 libxml2. Similarly, XML::LibXML::LIBXML_RUNTIME_VERSION returns the
241 version of the (usually dynamically) linked libxml2.
242
243 XML::LibXML::LIBXML_DOTTED_VERSION
244 $Version_String = XML::LibXML::LIBXML_DOTTED_VERSION;
245
246 Returns the version string of the libxml2 version XML::LibXML was
247 compiled for. This will be "2.6.2" for "libxml2 2.6.2".
248
249 XML::LibXML::LIBXML_VERSION
250 $Version_ID = XML::LibXML::LIBXML_VERSION;
251
252 Returns the version id of the libxml2 version XML::LibXML was
253 compiled for. This will be "20602" for "libxml2 2.6.2". Don't mix
254 this version id with $XML::LibXML::VERSION. The latter contains the
255 version of XML::LibXML itself while the first contains the version
256 of libxml2 XML::LibXML was compiled for.
257
258 XML::LibXML::LIBXML_RUNTIME_VERSION
259 $DLL_Version = XML::LibXML::LIBXML_RUNTIME_VERSION;
260
261 Returns a version string of the libxml2 which is (usually
262 dynamically) linked by XML::LibXML. This will be "20602" for
263 libxml2 released as "2.6.2" and something like "20602-CVS2032" for
264 a CVS build of libxml2.
265
266 XML::LibXML issues a warning if the version of libxml2 dynamically
267 linked to it is less than the version of libxml2 which it was
268 compiled against.
269
271 By default the module exports all constants and functions listed in the
272 :all tag, described below.
273
275 ":all"
276 Includes the tags ":libxml", ":encoding", and ":ns" described
277 below.
278
279 ":libxml"
280 Exports integer constants for DOM node types.
281
282 XML_ELEMENT_NODE => 1
283 XML_ATTRIBUTE_NODE => 2
284 XML_TEXT_NODE => 3
285 XML_CDATA_SECTION_NODE => 4
286 XML_ENTITY_REF_NODE => 5
287 XML_ENTITY_NODE => 6
288 XML_PI_NODE => 7
289 XML_COMMENT_NODE => 8
290 XML_DOCUMENT_NODE => 9
291 XML_DOCUMENT_TYPE_NODE => 10
292 XML_DOCUMENT_FRAG_NODE => 11
293 XML_NOTATION_NODE => 12
294 XML_HTML_DOCUMENT_NODE => 13
295 XML_DTD_NODE => 14
296 XML_ELEMENT_DECL => 15
297 XML_ATTRIBUTE_DECL => 16
298 XML_ENTITY_DECL => 17
299 XML_NAMESPACE_DECL => 18
300 XML_XINCLUDE_START => 19
301 XML_XINCLUDE_END => 20
302
303 ":encoding"
304 Exports two encoding conversion functions from XML::LibXML::Common.
305
306 encodeToUTF8()
307 decodeFromUTF8()
308
309 ":ns"
310 Exports two convenience constants: the implicit namespace of the
311 reserved "xml:" prefix, and the implicit namespace for the reserved
312 "xmlns:" prefix.
313
314 XML_XML_NS => 'http://www.w3.org/XML/1998/namespace'
315 XML_XMLNS_NS => 'http://www.w3.org/2000/xmlns/'
316
318 The modules described in this section are not part of the XML::LibXML
319 package itself. As they support some additional features, they are
320 mentioned here.
321
322 XML::LibXSLT
323 XSLT 1.0 Processor using libxslt and XML::LibXML
324
325 XML::LibXML::Iterator
326 XML::LibXML Implementation of the DOM Traversal Specification
327
328 XML::CompactTree::XS
329 Uses XML::LibXML::Reader to very efficiently to parse XML document
330 or element into native Perl data structures, which are less
331 flexible but significantly faster to process then DOM.
332
334 Note: THE FUNCTIONS DESCRIBED HERE ARE STILL EXPERIMENTAL
335
336 Although both modules make use of libxml2's XML capabilities, the DOM
337 implementation of both modules are not compatible. But still it is
338 possible to exchange nodes from one DOM to the other. The concept of
339 this exchange is pretty similar to the function cloneNode(): The
340 particular node is copied on the low-level to the opposite DOM
341 implementation.
342
343 Since the DOM implementations cannot coexist within one document, one
344 is forced to copy each node that should be used. Because you are always
345 keeping two nodes this may cause quite an impact on a machines memory
346 usage.
347
348 XML::LibXML provides two functions to export or import GDOME nodes:
349 import_GDOME() and export_GDOME(). Both function have two parameters:
350 the node and a flag for recursive import. The flag works as in
351 cloneNode().
352
353 The two functions allow to export and import XML::GDOME nodes
354 explicitly, however, XML::LibXML allows also the transparent import of
355 XML::GDOME nodes in functions such as appendChild(), insertAfter() and
356 so on. While native nodes are automatically adopted in most functions
357 XML::GDOME nodes are always cloned in advance. Thus if the original
358 node is modified after the operation, the node in the XML::LibXML
359 document will not have this information.
360
361 import_GDOME
362 $libxmlnode = XML::LibXML->import_GDOME( $node, $deep );
363
364 This clones an XML::GDOME node to a XML::LibXML node explicitly.
365
366 export_GDOME
367 $gdomenode = XML::LibXML->export_GDOME( $node, $deep );
368
369 Allows to clone an XML::LibXML node into a XML::GDOME node.
370
372 For bug reports, please use the CPAN request tracker on
373 http://rt.cpan.org/NoAuth/Bugs.html?Dist=XML-LibXML
374
375 For suggestions etc., and other issues related to XML::LibXML you may
376 use the perl XML mailing list ("perl-xml@listserv.ActiveState.com"),
377 where most XML-related Perl modules are discussed. In case of problems
378 you should check the archives of that list first. Many problems are
379 already discussed there. You can find the list's archives and
380 subscription options at
381 http://aspn.activestate.com/ASPN/Mail/Browse/Threaded/perl-xml
382 <http://aspn.activestate.com/ASPN/Mail/Browse/Threaded/perl-xml>.
383
385 Matt Sergeant, Christian Glahn, Petr Pajas
386
388 1.70
389
391 2001-2007, AxKit.com Ltd.
392
393 2002-2006, Christian Glahn.
394
395 2006-2009, Petr Pajas.
396
397
398
399perl v5.12.0 2009-10-07 LibXML(3)