1LibXML(3) User Contributed Perl Documentation LibXML(3)
2
3
4
6 XML::LibXML - Perl Binding for libxml2
7
9 use XML::LibXML;
10 my $dom = XML::LibXML->load_xml(string => <<'EOT');
11 <some-xml/>
12 EOT
13
14 $Version_String = XML::LibXML::LIBXML_DOTTED_VERSION;
15 $Version_ID = XML::LibXML::LIBXML_VERSION;
16 $DLL_Version = XML::LibXML::LIBXML_RUNTIME_VERSION;
17 $libxmlnode = XML::LibXML->import_GDOME( $node, $deep );
18 $gdomenode = XML::LibXML->export_GDOME( $node, $deep );
19
21 This module is an interface to libxml2, providing XML and HTML parsers
22 with DOM, SAX and XMLReader interfaces, a large subset of DOM Layer 3
23 interface and a XML::XPath-like interface to XPath API of libxml2. The
24 module is split into several packages which are not described in this
25 section; unless stated otherwise, you only need to "use XML::LibXML;"
26 in your programs.
27
28 For further information, please check the following documentation:
29
30 XML::LibXML::Parser
31 Parsing XML files with XML::LibXML
32
33 XML::LibXML::DOM
34 XML::LibXML Document Object Model (DOM) Implementation
35
36 XML::LibXML::SAX
37 XML::LibXML direct SAX parser
38
39 XML::LibXML::Reader
40 Reading XML with a pull-parser
41
42 XML::LibXML::Dtd
43 XML::LibXML frontend for DTD validation
44
45 XML::LibXML::RelaxNG
46 XML::LibXML frontend for RelaxNG schema validation
47
48 XML::LibXML::Schema
49 XML::LibXML frontend for W3C Schema schema validation
50
51 XML::LibXML::XPathContext
52 API for evaluating XPath expressions with enhanced support for the
53 evaluation context
54
55 XML::LibXML::InputCallback
56 Implementing custom URI Resolver and input callbacks
57
58 XML::LibXML::Common
59 Common functions for XML::LibXML related Classes
60
61 The nodes in the Document Object Model (DOM) are represented by the
62 following classes (most of which "inherit" from XML::LibXML::Node):
63
64 XML::LibXML::Document
65 XML::LibXML class for DOM document nodes
66
67 XML::LibXML::Node
68 Abstract base class for XML::LibXML DOM nodes
69
70 XML::LibXML::Element
71 XML::LibXML class for DOM element nodes
72
73 XML::LibXML::Text
74 XML::LibXML class for DOM text nodes
75
76 XML::LibXML::Comment
77 XML::LibXML class for comment DOM nodes
78
79 XML::LibXML::CDATASection
80 XML::LibXML class for DOM CDATA sections
81
82 XML::LibXML::Attr
83 XML::LibXML DOM attribute class
84
85 XML::LibXML::DocumentFragment
86 XML::LibXML's DOM L2 Document Fragment implementation
87
88 XML::LibXML::Namespace
89 XML::LibXML DOM namespace nodes
90
91 XML::LibXML::PI
92 XML::LibXML DOM processing instruction nodes
93
95 Recall that since version 5.6.1, Perl distinguishes between character
96 strings (internally encoded in UTF-8) and so called binary data and,
97 accordingly, applies either character or byte semantics to them. A
98 scalar representing a character string is distinguished from a byte
99 string by special flag (UTF8). Please refer to perlunicode for
100 details.
101
102 XML::LibXML's API is designed to deal with many encodings of XML
103 documents completely transparently, so that the application using
104 XML::LibXML can be completely ignorant about the encoding of the XML
105 documents it works with. On the other hand, functions like
106 "XML::LibXML::Document->setEncoding" give the user control over the
107 document encoding.
108
109 To ensure the aforementioned transparency and uniformity, most
110 functions of XML::LibXML that work with in-memory trees accept and
111 return data as character strings (i.e. UTF-8 encoded with the UTF8 flag
112 on) regardless of the original document encoding; however, the
113 functions related to I/O operations (i.e. parsing and saving) operate
114 with binary data (in the original document encoding) obeying the
115 encoding declaration of the XML documents.
116
117 Below we summarize basic rules and principles regarding encoding:
118
119 1. Do NOT apply any encoding-related PerlIO layers (":utf8" or
120 ":encoding(...)") to file handles that are an input for the parses
121 or an output for a serializer of (full) XML documents. This is
122 because the conversion of the data to/from the internal character
123 representation is provided by libxml2 itself which must be able to
124 enforce the encoding specified by the "<?xml version="1.0"
125 encoding="..."?>" declaration. Here is an example to follow:
126
127 use XML::LibXML;
128 # load
129 open my $fh, '<', 'file.xml';
130 binmode $fh; # drop all PerlIO layers possibly created by a use open pragma
131 $doc = XML::LibXML->load_xml(IO => $fh);
132
133 # save
134 open my $out, '>', 'out.xml';
135 binmode $out; # as above
136 $doc->toFH($out);
137 # or
138 print {$out} $doc->toString();
139
140 2. All functions working with DOM accept and return character strings
141 (UTF-8 encoded with UTF8 flag on). E.g.
142
143 my $doc = XML::LibXML::Document->new('1.0',$some_encoding);
144 my $element = $doc->createElement($name);
145 $element->appendText($text);
146 $xml_fragment = $element->toString(); # returns a character string
147 $xml_document = $doc->toString(); # returns a byte string
148
149 where $some_encoding is the document encoding that will be used
150 when saving the document, and $name and $text contain character
151 strings (UTF-8 encoded with UTF8 flag on). Note that the method
152 "toString" returns XML as a character string if applied to other
153 node than the Document node and a byte string containing the
154 appropriate
155
156 <?xml version="1.0" encoding="..."?>
157
158 declaration if applied to a XML::LibXML::Document.
159
160 3. DOM methods also accept binary strings in the original encoding of
161 the document to which the node belongs (UTF-8 is assumed if the
162 node is not attached to any document). Exploiting this feature is
163 NOT RECOMMENDED since it is considered bad practice.
164
165 my $doc = XML::LibXML::Document->new('1.0','iso-8859-2');
166 my $text = $doc->createTextNode($some_latin2_encoded_byte_string);
167 # WORKS, BUT NOT RECOMMENDED!
168
169 NOTE: libxml2 support for many encodings is based on the iconv library.
170 The actual list of supported encodings may vary from platform to
171 platform. To test if your platform works correctly with your language
172 encoding, build a simple document in the particular encoding and try to
173 parse it with XML::LibXML to see if the parser produces any errors.
174 Occasional crashes were reported on rare platforms that ship with a
175 broken version of iconv.
176
178 XML::LibXML since 1.67 partially supports Perl threads in Perl >=
179 5.8.8. XML::LibXML can be used with threads in two ways:
180
181 By default, all XML::LibXML classes use CLONE_SKIP class method to
182 prevent Perl from copying XML::LibXML::* objects when a new thread is
183 spawn. In this mode, all XML::LibXML::* objects are thread specific.
184 This is the safest way to work with XML::LibXML in threads.
185
186 Alternatively, one may use
187
188 use threads;
189 use XML::LibXML qw(:threads_shared);
190
191 to indicate, that all XML::LibXML node and parser objects should be
192 shared between the main thread and any thread spawn from there. For
193 example, in
194
195 my $doc = XML::LibXML->load_xml(location => $filename);
196 my $thr = threads->new(sub{
197 # code working with $doc
198 1;
199 });
200 $thr->join;
201
202 the variable $doc refers to the exact same XML::LibXML::Document in the
203 spawned thread as in the main thread.
204
205 Without using mutex locks, parallel threads may read the same document
206 (i.e. any node that belongs to the document), parse files, and modify
207 different documents.
208
209 However, if there is a chance that some of the threads will attempt to
210 modify a document (or even create new nodes based on that document,
211 e.g. with "$doc->createElement") that other threads may be reading at
212 the same time, the user is responsible for creating a mutex lock and
213 using it in both in the thread that modifies and the thread that reads:
214
215 my $doc = XML::LibXML->load_xml(location => $filename);
216 my $mutex : shared;
217 my $thr = threads->new(sub{
218 lock $mutex;
219 my $el = $doc->createElement('foo');
220 # ...
221 1;
222 });
223 {
224 lock $mutex;
225 my $root = $doc->documentElement;
226 say $root->name;
227 }
228 $thr->join;
229
230 Note that libxml2 uses dictionaries to store short strings and these
231 dictionaries are kept on a document node. Without mutex locks, it could
232 happen in the previous example that the thread modifies the dictionary
233 while other threads attempt to read from it, which could easily lead to
234 a crash.
235
237 Sometimes it is useful to figure out, for which version XML::LibXML was
238 compiled for. In most cases this is for debugging or to check if a
239 given installation meets all functionality for the package. The
240 functions XML::LibXML::LIBXML_DOTTED_VERSION and
241 XML::LibXML::LIBXML_VERSION provide this version information. Both
242 functions simply pass through the values of the similar named macros of
243 libxml2. Similarly, XML::LibXML::LIBXML_RUNTIME_VERSION returns the
244 version of the (usually dynamically) linked libxml2.
245
246 XML::LibXML::LIBXML_DOTTED_VERSION
247 $Version_String = XML::LibXML::LIBXML_DOTTED_VERSION;
248
249 Returns the version string of the libxml2 version XML::LibXML was
250 compiled for. This will be "2.6.2" for "libxml2 2.6.2".
251
252 XML::LibXML::LIBXML_VERSION
253 $Version_ID = XML::LibXML::LIBXML_VERSION;
254
255 Returns the version id of the libxml2 version XML::LibXML was
256 compiled for. This will be "20602" for "libxml2 2.6.2". Don't mix
257 this version id with $XML::LibXML::VERSION. The latter contains the
258 version of XML::LibXML itself while the first contains the version
259 of libxml2 XML::LibXML was compiled for.
260
261 XML::LibXML::LIBXML_RUNTIME_VERSION
262 $DLL_Version = XML::LibXML::LIBXML_RUNTIME_VERSION;
263
264 Returns a version string of the libxml2 which is (usually
265 dynamically) linked by XML::LibXML. This will be "20602" for
266 libxml2 released as "2.6.2" and something like "20602-CVS2032" for
267 a CVS build of libxml2.
268
269 XML::LibXML issues a warning if the version of libxml2 dynamically
270 linked to it is less than the version of libxml2 which it was
271 compiled against.
272
274 By default the module exports all constants and functions listed in the
275 :all tag, described below.
276
278 ":all"
279 Includes the tags ":libxml", ":encoding", and ":ns" described
280 below.
281
282 ":libxml"
283 Exports integer constants for DOM node types.
284
285 XML_ELEMENT_NODE => 1
286 XML_ATTRIBUTE_NODE => 2
287 XML_TEXT_NODE => 3
288 XML_CDATA_SECTION_NODE => 4
289 XML_ENTITY_REF_NODE => 5
290 XML_ENTITY_NODE => 6
291 XML_PI_NODE => 7
292 XML_COMMENT_NODE => 8
293 XML_DOCUMENT_NODE => 9
294 XML_DOCUMENT_TYPE_NODE => 10
295 XML_DOCUMENT_FRAG_NODE => 11
296 XML_NOTATION_NODE => 12
297 XML_HTML_DOCUMENT_NODE => 13
298 XML_DTD_NODE => 14
299 XML_ELEMENT_DECL => 15
300 XML_ATTRIBUTE_DECL => 16
301 XML_ENTITY_DECL => 17
302 XML_NAMESPACE_DECL => 18
303 XML_XINCLUDE_START => 19
304 XML_XINCLUDE_END => 20
305
306 ":encoding"
307 Exports two encoding conversion functions from XML::LibXML::Common.
308
309 encodeToUTF8()
310 decodeFromUTF8()
311
312 ":ns"
313 Exports two convenience constants: the implicit namespace of the
314 reserved "xml:" prefix, and the implicit namespace for the reserved
315 "xmlns:" prefix.
316
317 XML_XML_NS => 'http://www.w3.org/XML/1998/namespace'
318 XML_XMLNS_NS => 'http://www.w3.org/2000/xmlns/'
319
321 The modules described in this section are not part of the XML::LibXML
322 package itself. As they support some additional features, they are
323 mentioned here.
324
325 XML::LibXSLT
326 XSLT 1.0 Processor using libxslt and XML::LibXML
327
328 XML::LibXML::Iterator
329 XML::LibXML Implementation of the DOM Traversal Specification
330
331 XML::CompactTree::XS
332 Uses XML::LibXML::Reader to very efficiently to parse XML document
333 or element into native Perl data structures, which are less
334 flexible but significantly faster to process then DOM.
335
337 Note: THE FUNCTIONS DESCRIBED HERE ARE STILL EXPERIMENTAL
338
339 Although both modules make use of libxml2's XML capabilities, the DOM
340 implementation of both modules are not compatible. But still it is
341 possible to exchange nodes from one DOM to the other. The concept of
342 this exchange is pretty similar to the function cloneNode(): The
343 particular node is copied on the low-level to the opposite DOM
344 implementation.
345
346 Since the DOM implementations cannot coexist within one document, one
347 is forced to copy each node that should be used. Because you are always
348 keeping two nodes this may cause quite an impact on a machines memory
349 usage.
350
351 XML::LibXML provides two functions to export or import GDOME nodes:
352 import_GDOME() and export_GDOME(). Both function have two parameters:
353 the node and a flag for recursive import. The flag works as in
354 cloneNode().
355
356 The two functions allow to export and import XML::GDOME nodes
357 explicitly, however, XML::LibXML allows also the transparent import of
358 XML::GDOME nodes in functions such as appendChild(), insertAfter() and
359 so on. While native nodes are automatically adopted in most functions
360 XML::GDOME nodes are always cloned in advance. Thus if the original
361 node is modified after the operation, the node in the XML::LibXML
362 document will not have this information.
363
364 import_GDOME
365 $libxmlnode = XML::LibXML->import_GDOME( $node, $deep );
366
367 This clones an XML::GDOME node to an XML::LibXML node explicitly.
368
369 export_GDOME
370 $gdomenode = XML::LibXML->export_GDOME( $node, $deep );
371
372 Allows to clone an XML::LibXML node into an XML::GDOME node.
373
375 For bug reports, please use the CPAN request tracker on
376 http://rt.cpan.org/NoAuth/Bugs.html?Dist=XML-LibXML
377
378 For suggestions etc., and other issues related to XML::LibXML you may
379 use the perl XML mailing list ("perl-xml@listserv.ActiveState.com"),
380 where most XML-related Perl modules are discussed. In case of problems
381 you should check the archives of that list first. Many problems are
382 already discussed there. You can find the list's archives and
383 subscription options at
384 <http://aspn.activestate.com/ASPN/Mail/Browse/Threaded/perl-xml>.
385
387 Matt Sergeant, Christian Glahn, Petr Pajas
388
390 2.0018
391
393 2001-2007, AxKit.com Ltd.
394
395 2002-2006, Christian Glahn.
396
397 2006-2009, Petr Pajas.
398
399
400
401perl v5.16.3 2013-05-13 LibXML(3)