1LibXML(3) User Contributed Perl Documentation LibXML(3)
2
3
4
6 XML::LibXML - Perl Binding for libxml2
7
9 use XML::LibXML;
10 my $dom = XML::LibXML->load_xml(string => <<'EOT');
11 <some-xml/>
12 EOT
13
14 $Version_String = XML::LibXML::LIBXML_DOTTED_VERSION;
15 $Version_ID = XML::LibXML::LIBXML_VERSION;
16 $DLL_Version = XML::LibXML::LIBXML_RUNTIME_VERSION;
17 $libxmlnode = XML::LibXML->import_GDOME( $node, $deep );
18 $gdomenode = XML::LibXML->export_GDOME( $node, $deep );
19
21 This module is an interface to libxml2, providing XML and HTML parsers
22 with DOM, SAX and XMLReader interfaces, a large subset of DOM Layer 3
23 interface and a XML::XPath-like interface to XPath API of libxml2. The
24 module is split into several packages which are not described in this
25 section; unless stated otherwise, you only need to "use XML::LibXML;"
26 in your programs.
27
28 Check out XML::LibXML by Example
29 (<http://grantm.github.io/perl-libxml-by-example/>) for a tutorial.
30
31 For further information, please check the following documentation:
32
33 XML::LibXML::Parser
34 Parsing XML files with XML::LibXML
35
36 XML::LibXML::DOM
37 XML::LibXML Document Object Model (DOM) Implementation
38
39 XML::LibXML::SAX
40 XML::LibXML direct SAX parser
41
42 XML::LibXML::Reader
43 Reading XML with a pull-parser
44
45 XML::LibXML::Dtd
46 XML::LibXML frontend for DTD validation
47
48 XML::LibXML::RelaxNG
49 XML::LibXML frontend for RelaxNG schema validation
50
51 XML::LibXML::Schema
52 XML::LibXML frontend for W3C Schema schema validation
53
54 XML::LibXML::XPathContext
55 API for evaluating XPath expressions with enhanced support for the
56 evaluation context
57
58 XML::LibXML::InputCallback
59 Implementing custom URI Resolver and input callbacks
60
61 XML::LibXML::Common
62 Common functions for XML::LibXML related Classes
63
64 The nodes in the Document Object Model (DOM) are represented by the
65 following classes (most of which "inherit" from XML::LibXML::Node):
66
67 XML::LibXML::Document
68 XML::LibXML class for DOM document nodes
69
70 XML::LibXML::Node
71 Abstract base class for XML::LibXML DOM nodes
72
73 XML::LibXML::Element
74 XML::LibXML class for DOM element nodes
75
76 XML::LibXML::Text
77 XML::LibXML class for DOM text nodes
78
79 XML::LibXML::Comment
80 XML::LibXML class for comment DOM nodes
81
82 XML::LibXML::CDATASection
83 XML::LibXML class for DOM CDATA sections
84
85 XML::LibXML::Attr
86 XML::LibXML DOM attribute class
87
88 XML::LibXML::DocumentFragment
89 XML::LibXML's DOM L2 Document Fragment implementation
90
91 XML::LibXML::Namespace
92 XML::LibXML DOM namespace nodes
93
94 XML::LibXML::PI
95 XML::LibXML DOM processing instruction nodes
96
98 Recall that since version 5.6.1, Perl distinguishes between character
99 strings (internally encoded in UTF-8) and so called binary data and,
100 accordingly, applies either character or byte semantics to them. A
101 scalar representing a character string is distinguished from a byte
102 string by special flag (UTF8). Please refer to perlunicode for
103 details.
104
105 XML::LibXML's API is designed to deal with many encodings of XML
106 documents completely transparently, so that the application using
107 XML::LibXML can be completely ignorant about the encoding of the XML
108 documents it works with. On the other hand, functions like
109 "XML::LibXML::Document->setEncoding" give the user control over the
110 document encoding.
111
112 To ensure the aforementioned transparency and uniformity, most
113 functions of XML::LibXML that work with in-memory trees accept and
114 return data as character strings (i.e. UTF-8 encoded with the UTF8 flag
115 on) regardless of the original document encoding; however, the
116 functions related to I/O operations (i.e. parsing and saving) operate
117 with binary data (in the original document encoding) obeying the
118 encoding declaration of the XML documents.
119
120 Below we summarize basic rules and principles regarding encoding:
121
122 1. Do NOT apply any encoding-related PerlIO layers (":utf8" or
123 ":encoding(...)") to file handles that are an input for the parses
124 or an output for a serializer of (full) XML documents. This is
125 because the conversion of the data to/from the internal character
126 representation is provided by libxml2 itself which must be able to
127 enforce the encoding specified by the "<?xml version="1.0"
128 encoding="..."?>" declaration. Here is an example to follow:
129
130 use XML::LibXML;
131 # load
132 open my $fh, '<', 'file.xml';
133 binmode $fh; # drop all PerlIO layers possibly created by a use open pragma
134 $doc = XML::LibXML->load_xml(IO => $fh);
135
136 # save
137 open my $out, '>', 'out.xml';
138 binmode $out; # as above
139 $doc->toFH($out);
140 # or
141 print {$out} $doc->toString();
142
143 2. All functions working with DOM accept and return character strings
144 (UTF-8 encoded with UTF8 flag on). E.g.
145
146 my $doc = XML::LibXML::Document->new('1.0',$some_encoding);
147 my $element = $doc->createElement($name);
148 $element->appendText($text);
149 $xml_fragment = $element->toString(); # returns a character string
150 $xml_document = $doc->toString(); # returns a byte string
151
152 where $some_encoding is the document encoding that will be used
153 when saving the document, and $name and $text contain character
154 strings (UTF-8 encoded with UTF8 flag on). Note that the method
155 "toString" returns XML as a character string if applied to other
156 node than the Document node and a byte string containing the
157 appropriate
158
159 <?xml version="1.0" encoding="..."?>
160
161 declaration if applied to a XML::LibXML::Document.
162
163 3. DOM methods also accept binary strings in the original encoding of
164 the document to which the node belongs (UTF-8 is assumed if the
165 node is not attached to any document). Exploiting this feature is
166 NOT RECOMMENDED since it is considered bad practice.
167
168 my $doc = XML::LibXML::Document->new('1.0','iso-8859-2');
169 my $text = $doc->createTextNode($some_latin2_encoded_byte_string);
170 # WORKS, BUT NOT RECOMMENDED!
171
172 NOTE: libxml2 support for many encodings is based on the iconv library.
173 The actual list of supported encodings may vary from platform to
174 platform. To test if your platform works correctly with your language
175 encoding, build a simple document in the particular encoding and try to
176 parse it with XML::LibXML to see if the parser produces any errors.
177 Occasional crashes were reported on rare platforms that ship with a
178 broken version of iconv.
179
181 XML::LibXML since 1.67 partially supports Perl threads in Perl >=
182 5.8.8. XML::LibXML can be used with threads in two ways:
183
184 By default, all XML::LibXML classes use CLONE_SKIP class method to
185 prevent Perl from copying XML::LibXML::* objects when a new thread is
186 spawn. In this mode, all XML::LibXML::* objects are thread specific.
187 This is the safest way to work with XML::LibXML in threads.
188
189 Alternatively, one may use
190
191 use threads;
192 use XML::LibXML qw(:threads_shared);
193
194 to indicate, that all XML::LibXML node and parser objects should be
195 shared between the main thread and any thread spawn from there. For
196 example, in
197
198 my $doc = XML::LibXML->load_xml(location => $filename);
199 my $thr = threads->new(sub{
200 # code working with $doc
201 1;
202 });
203 $thr->join;
204
205 the variable $doc refers to the exact same XML::LibXML::Document in the
206 spawned thread as in the main thread.
207
208 Without using mutex locks, parallel threads may read the same document
209 (i.e. any node that belongs to the document), parse files, and modify
210 different documents.
211
212 However, if there is a chance that some of the threads will attempt to
213 modify a document (or even create new nodes based on that document,
214 e.g. with "$doc->createElement") that other threads may be reading at
215 the same time, the user is responsible for creating a mutex lock and
216 using it in both in the thread that modifies and the thread that reads:
217
218 my $doc = XML::LibXML->load_xml(location => $filename);
219 my $mutex : shared;
220 my $thr = threads->new(sub{
221 lock $mutex;
222 my $el = $doc->createElement('foo');
223 # ...
224 1;
225 });
226 {
227 lock $mutex;
228 my $root = $doc->documentElement;
229 say $root->name;
230 }
231 $thr->join;
232
233 Note that libxml2 uses dictionaries to store short strings and these
234 dictionaries are kept on a document node. Without mutex locks, it could
235 happen in the previous example that the thread modifies the dictionary
236 while other threads attempt to read from it, which could easily lead to
237 a crash.
238
240 Sometimes it is useful to figure out, for which version XML::LibXML was
241 compiled for. In most cases this is for debugging or to check if a
242 given installation meets all functionality for the package. The
243 functions XML::LibXML::LIBXML_DOTTED_VERSION and
244 XML::LibXML::LIBXML_VERSION provide this version information. Both
245 functions simply pass through the values of the similar named macros of
246 libxml2. Similarly, XML::LibXML::LIBXML_RUNTIME_VERSION returns the
247 version of the (usually dynamically) linked libxml2.
248
249 XML::LibXML::LIBXML_DOTTED_VERSION
250 $Version_String = XML::LibXML::LIBXML_DOTTED_VERSION;
251
252 Returns the version string of the libxml2 version XML::LibXML was
253 compiled for. This will be "2.6.2" for "libxml2 2.6.2".
254
255 XML::LibXML::LIBXML_VERSION
256 $Version_ID = XML::LibXML::LIBXML_VERSION;
257
258 Returns the version id of the libxml2 version XML::LibXML was
259 compiled for. This will be "20602" for "libxml2 2.6.2". Don't mix
260 this version id with $XML::LibXML::VERSION. The latter contains the
261 version of XML::LibXML itself while the first contains the version
262 of libxml2 XML::LibXML was compiled for.
263
264 XML::LibXML::LIBXML_RUNTIME_VERSION
265 $DLL_Version = XML::LibXML::LIBXML_RUNTIME_VERSION;
266
267 Returns a version string of the libxml2 which is (usually
268 dynamically) linked by XML::LibXML. This will be "20602" for
269 libxml2 released as "2.6.2" and something like "20602-CVS2032" for
270 a CVS build of libxml2.
271
272 XML::LibXML issues a warning if the version of libxml2 dynamically
273 linked to it is less than the version of libxml2 which it was
274 compiled against.
275
277 By default the module exports all constants and functions listed in the
278 :all tag, described below.
279
281 ":all"
282 Includes the tags ":libxml", ":encoding", and ":ns" described
283 below.
284
285 ":libxml"
286 Exports integer constants for DOM node types.
287
288 XML_ELEMENT_NODE => 1
289 XML_ATTRIBUTE_NODE => 2
290 XML_TEXT_NODE => 3
291 XML_CDATA_SECTION_NODE => 4
292 XML_ENTITY_REF_NODE => 5
293 XML_ENTITY_NODE => 6
294 XML_PI_NODE => 7
295 XML_COMMENT_NODE => 8
296 XML_DOCUMENT_NODE => 9
297 XML_DOCUMENT_TYPE_NODE => 10
298 XML_DOCUMENT_FRAG_NODE => 11
299 XML_NOTATION_NODE => 12
300 XML_HTML_DOCUMENT_NODE => 13
301 XML_DTD_NODE => 14
302 XML_ELEMENT_DECL => 15
303 XML_ATTRIBUTE_DECL => 16
304 XML_ENTITY_DECL => 17
305 XML_NAMESPACE_DECL => 18
306 XML_XINCLUDE_START => 19
307 XML_XINCLUDE_END => 20
308
309 ":encoding"
310 Exports two encoding conversion functions from XML::LibXML::Common.
311
312 encodeToUTF8()
313 decodeFromUTF8()
314
315 ":ns"
316 Exports two convenience constants: the implicit namespace of the
317 reserved "xml:" prefix, and the implicit namespace for the reserved
318 "xmlns:" prefix.
319
320 XML_XML_NS => 'http://www.w3.org/XML/1998/namespace'
321 XML_XMLNS_NS => 'http://www.w3.org/2000/xmlns/'
322
324 The modules described in this section are not part of the XML::LibXML
325 package itself. As they support some additional features, they are
326 mentioned here.
327
328 XML::LibXSLT
329 XSLT 1.0 Processor using libxslt and XML::LibXML
330
331 XML::LibXML::Iterator
332 XML::LibXML Implementation of the DOM Traversal Specification
333
334 XML::CompactTree::XS
335 Uses XML::LibXML::Reader to very efficiently to parse XML document
336 or element into native Perl data structures, which are less
337 flexible but significantly faster to process then DOM.
338
340 Note: THE FUNCTIONS DESCRIBED HERE ARE STILL EXPERIMENTAL
341
342 Although both modules make use of libxml2's XML capabilities, the DOM
343 implementation of both modules are not compatible. But still it is
344 possible to exchange nodes from one DOM to the other. The concept of
345 this exchange is pretty similar to the function cloneNode(): The
346 particular node is copied on the low-level to the opposite DOM
347 implementation.
348
349 Since the DOM implementations cannot coexist within one document, one
350 is forced to copy each node that should be used. Because you are always
351 keeping two nodes this may cause quite an impact on a machines memory
352 usage.
353
354 XML::LibXML provides two functions to export or import GDOME nodes:
355 import_GDOME() and export_GDOME(). Both function have two parameters:
356 the node and a flag for recursive import. The flag works as in
357 cloneNode().
358
359 The two functions allow one to export and import XML::GDOME nodes
360 explicitly, however, XML::LibXML also allows the transparent import of
361 XML::GDOME nodes in functions such as appendChild(), insertAfter() and
362 so on. While native nodes are automatically adopted in most functions
363 XML::GDOME nodes are always cloned in advance. Thus if the original
364 node is modified after the operation, the node in the XML::LibXML
365 document will not have this information.
366
367 import_GDOME
368 $libxmlnode = XML::LibXML->import_GDOME( $node, $deep );
369
370 This clones an XML::GDOME node to an XML::LibXML node explicitly.
371
372 export_GDOME
373 $gdomenode = XML::LibXML->export_GDOME( $node, $deep );
374
375 Allows one to clone an XML::LibXML node into an XML::GDOME node.
376
378 For bug reports, please use the CPAN request tracker on
379 http://rt.cpan.org/NoAuth/Bugs.html?Dist=XML-LibXML
380
381 For suggestions etc., and other issues related to XML::LibXML you may
382 use the perl XML mailing list ("perl-xml@listserv.ActiveState.com"),
383 where most XML-related Perl modules are discussed. In case of problems
384 you should check the archives of that list first. Many problems are
385 already discussed there. You can find the list's archives and
386 subscription options at
387 <http://aspn.activestate.com/ASPN/Mail/Browse/Threaded/perl-xml>.
388
390 Matt Sergeant, Christian Glahn, Petr Pajas
391
393 2.0205
394
396 2001-2007, AxKit.com Ltd.
397
398 2002-2006, Christian Glahn.
399
400 2006-2009, Petr Pajas.
401
403 This program is free software; you can redistribute it and/or modify it
404 under the same terms as Perl itself.
405
406
407
408perl v5.32.0 2020-07-28 LibXML(3)