1LibXML(3)             User Contributed Perl Documentation            LibXML(3)
2
3
4

NAME

6       XML::LibXML - Perl Binding for libxml2
7

SYNOPSIS

9         use XML::LibXML;
10         my $dom = XML::LibXML->load_xml(string => <<'EOT');
11         <some-xml/>
12         EOT
13
14         $Version_String = XML::LibXML::LIBXML_DOTTED_VERSION;
15         $Version_ID = XML::LibXML::LIBXML_VERSION;
16         $DLL_Version = XML::LibXML::LIBXML_RUNTIME_VERSION;
17         $libxmlnode = XML::LibXML->import_GDOME( $node, $deep );
18         $gdomenode = XML::LibXML->export_GDOME( $node, $deep );
19

DESCRIPTION

21       This module is an interface to libxml2, providing XML and HTML parsers
22       with DOM, SAX and XMLReader interfaces, a large subset of DOM Layer 3
23       interface and a XML::XPath-like interface to XPath API of libxml2. The
24       module is split into several packages which are not described in this
25       section; unless stated otherwise, you only need to "use XML::LibXML;"
26       in your programs.
27
28       Check out XML::LibXML by Example
29       (<http://grantm.github.io/perl-libxml-by-example/>) for a tutorial.
30
31       For further information, please check the following documentation:
32
33       XML::LibXML::Parser
34           Parsing XML files with XML::LibXML
35
36       XML::LibXML::DOM
37           XML::LibXML Document Object Model (DOM) Implementation
38
39       XML::LibXML::SAX
40           XML::LibXML direct SAX parser
41
42       XML::LibXML::Reader
43           Reading XML with a pull-parser
44
45       XML::LibXML::Dtd
46           XML::LibXML frontend for DTD validation
47
48       XML::LibXML::RelaxNG
49           XML::LibXML frontend for RelaxNG schema validation
50
51       XML::LibXML::Schema
52           XML::LibXML frontend for W3C Schema schema validation
53
54       XML::LibXML::XPathContext
55           API for evaluating XPath expressions with enhanced support for the
56           evaluation context
57
58       XML::LibXML::InputCallback
59           Implementing custom URI Resolver and input callbacks
60
61       XML::LibXML::Common
62           Common functions for XML::LibXML related Classes
63
64       The nodes in the Document Object Model (DOM) are represented by the
65       following classes (most of which "inherit" from XML::LibXML::Node):
66
67       XML::LibXML::Document
68           XML::LibXML class for DOM document nodes
69
70       XML::LibXML::Node
71           Abstract base class for XML::LibXML DOM nodes
72
73       XML::LibXML::Element
74           XML::LibXML class for DOM element nodes
75
76       XML::LibXML::Text
77           XML::LibXML class for DOM text nodes
78
79       XML::LibXML::Comment
80           XML::LibXML class for comment DOM nodes
81
82       XML::LibXML::CDATASection
83           XML::LibXML class for DOM CDATA sections
84
85       XML::LibXML::Attr
86           XML::LibXML DOM attribute class
87
88       XML::LibXML::DocumentFragment
89           XML::LibXML's DOM L2 Document Fragment implementation
90
91       XML::LibXML::Namespace
92           XML::LibXML DOM namespace nodes
93
94       XML::LibXML::PI
95           XML::LibXML DOM processing instruction nodes
96

ENCODINGS SUPPORT IN XML::LIBXML

98       Recall that since version 5.6.1, Perl distinguishes between character
99       strings (internally encoded in UTF-8) and so called binary data and,
100       accordingly, applies either character or byte semantics to them. A
101       scalar representing a character string is distinguished from a byte
102       string by special flag (UTF8).  Please refer to perlunicode for
103       details.
104
105       XML::LibXML's API is designed to deal with many encodings of XML
106       documents completely transparently, so that the application using
107       XML::LibXML can be completely ignorant about the encoding of the XML
108       documents it works with. On the other hand, functions like
109       "XML::LibXML::Document->setEncoding" give the user control over the
110       document encoding.
111
112       To ensure the aforementioned transparency and uniformity, most
113       functions of XML::LibXML that work with in-memory trees accept and
114       return data as character strings (i.e. UTF-8 encoded with the UTF8 flag
115       on) regardless of the original document encoding; however, the
116       functions related to I/O operations (i.e.  parsing and saving) operate
117       with binary data (in the original document encoding) obeying the
118       encoding declaration of the XML documents.
119
120       Below we summarize basic rules and principles regarding encoding:
121
122       1.  Do NOT apply any encoding-related PerlIO layers (":utf8" or
123           ":encoding(...)") to file handles that are an input for the parses
124           or an output for a serializer of (full) XML documents. This is
125           because the conversion of the data to/from the internal character
126           representation is provided by libxml2 itself which must be able to
127           enforce the encoding specified by the "<?xml version="1.0"
128           encoding="..."?>" declaration. Here is an example to follow:
129
130             use XML::LibXML;
131             # load
132             open my $fh, '<', 'file.xml';
133             binmode $fh; # drop all PerlIO layers possibly created by a use open pragma
134             $doc = XML::LibXML->load_xml(IO => $fh);
135
136             # save
137             open my $out, '>', 'out.xml';
138             binmode $out; # as above
139             $doc->toFH($out);
140             # or
141             print {$out} $doc->toString();
142
143       2.  All functions working with DOM accept and return character strings
144           (UTF-8 encoded with UTF8 flag on). E.g.
145
146             my $doc = XML::LibXML::Document->new('1.0',$some_encoding);
147             my $element = $doc->createElement($name);
148             $element->appendText($text);
149             $xml_fragment = $element->toString(); # returns a character string
150             $xml_document = $doc->toString(); # returns a byte string
151
152           where $some_encoding is the document encoding that will be used
153           when saving the document, and $name and $text contain character
154           strings (UTF-8 encoded with UTF8 flag on). Note that the method
155           "toString" returns XML as a character string if applied to other
156           node than the Document node and a byte string containing the
157           appropriate
158
159             <?xml version="1.0" encoding="..."?>
160
161           declaration if applied to a XML::LibXML::Document.
162
163       3.  DOM methods also accept binary strings in the original encoding of
164           the document to which the node belongs (UTF-8 is assumed if the
165           node is not attached to any document). Exploiting this feature is
166           NOT RECOMMENDED since it is considered bad practice.
167
168             my $doc = XML::LibXML::Document->new('1.0','iso-8859-2');
169             my $text = $doc->createTextNode($some_latin2_encoded_byte_string);
170             # WORKS, BUT NOT RECOMMENDED!
171
172       NOTE: libxml2 support for many encodings is based on the iconv library.
173       The actual list of supported encodings may vary from platform to
174       platform. To test if your platform works correctly with your language
175       encoding, build a simple document in the particular encoding and try to
176       parse it with XML::LibXML to see if the parser produces any errors.
177       Occasional crashes were reported on rare platforms that ship with a
178       broken version of iconv.
179

THREAD SUPPORT

181       XML::LibXML since 1.67 partially supports Perl threads in Perl >=
182       5.8.8.  XML::LibXML can be used with threads in two ways:
183
184       By default, all XML::LibXML classes use CLONE_SKIP class method to
185       prevent Perl from copying XML::LibXML::* objects when a new thread is
186       spawn. In this mode, all XML::LibXML::* objects are thread specific.
187       This is the safest way to work with XML::LibXML in threads.
188
189       Alternatively, one may use
190
191         use threads;
192         use XML::LibXML qw(:threads_shared);
193
194       to indicate, that all XML::LibXML node and parser objects should be
195       shared between the main thread and any thread spawn from there. For
196       example, in
197
198         my $doc = XML::LibXML->load_xml(location => $filename);
199         my $thr = threads->new(sub{
200           # code working with $doc
201           1;
202         });
203         $thr->join;
204
205       the variable $doc refers to the exact same XML::LibXML::Document in the
206       spawned thread as in the main thread.
207
208       Without using mutex locks, parallel threads may read the same document
209       (i.e.  any node that belongs to the document), parse files, and modify
210       different documents.
211
212       However, if there is a chance that some of the threads will attempt to
213       modify a document (or even create new nodes based on that document,
214       e.g. with "$doc->createElement") that other threads may be reading at
215       the same time, the user is responsible for creating a mutex lock and
216       using it in both in the thread that modifies and the thread that reads:
217
218         my $doc = XML::LibXML->load_xml(location => $filename);
219         my $mutex : shared;
220         my $thr = threads->new(sub{
221            lock $mutex;
222            my $el = $doc->createElement('foo');
223            # ...
224           1;
225         });
226         {
227           lock $mutex;
228           my $root = $doc->documentElement;
229           say $root->name;
230         }
231         $thr->join;
232
233       Note that libxml2 uses dictionaries to store short strings and these
234       dictionaries are kept on a document node. Without mutex locks, it could
235       happen in the previous example that the thread modifies the dictionary
236       while other threads attempt to read from it, which could easily lead to
237       a crash.
238

VERSION INFORMATION

240       Sometimes it is useful to figure out, for which version XML::LibXML was
241       compiled for. In most cases this is for debugging or to check if a
242       given installation meets all functionality for the package. The
243       functions XML::LibXML::LIBXML_DOTTED_VERSION and
244       XML::LibXML::LIBXML_VERSION provide this version information. Both
245       functions simply pass through the values of the similar named macros of
246       libxml2. Similarly, XML::LibXML::LIBXML_RUNTIME_VERSION returns the
247       version of the (usually dynamically) linked libxml2.
248
249       XML::LibXML::LIBXML_DOTTED_VERSION
250             $Version_String = XML::LibXML::LIBXML_DOTTED_VERSION;
251
252           Returns the version string of the libxml2 version XML::LibXML was
253           compiled for.  This will be "2.6.2" for "libxml2 2.6.2".
254
255       XML::LibXML::LIBXML_VERSION
256             $Version_ID = XML::LibXML::LIBXML_VERSION;
257
258           Returns the version id of the libxml2 version XML::LibXML was
259           compiled for.  This will be "20602" for "libxml2 2.6.2". Don't mix
260           this version id with $XML::LibXML::VERSION. The latter contains the
261           version of XML::LibXML itself while the first contains the version
262           of libxml2 XML::LibXML was compiled for.
263
264       XML::LibXML::LIBXML_RUNTIME_VERSION
265             $DLL_Version = XML::LibXML::LIBXML_RUNTIME_VERSION;
266
267           Returns a version string of the libxml2 which is (usually
268           dynamically) linked by XML::LibXML. This will be "20602" for
269           libxml2 released as "2.6.2" and something like "20602-CVS2032" for
270           a CVS build of libxml2.
271
272           XML::LibXML issues a warning if the version of libxml2 dynamically
273           linked to it is less than the version of libxml2 which it was
274           compiled against.
275

EXPORTS

277       By default the module exports all constants and functions listed in the
278       :all tag, described below.
279

EXPORT TAGS

281       ":all"
282           Includes the tags ":libxml", ":encoding", and ":ns" described
283           below.
284
285       ":libxml"
286           Exports integer constants for DOM node types.
287
288             XML_ELEMENT_NODE            => 1
289             XML_ATTRIBUTE_NODE          => 2
290             XML_TEXT_NODE               => 3
291             XML_CDATA_SECTION_NODE      => 4
292             XML_ENTITY_REF_NODE         => 5
293             XML_ENTITY_NODE             => 6
294             XML_PI_NODE                 => 7
295             XML_COMMENT_NODE            => 8
296             XML_DOCUMENT_NODE           => 9
297             XML_DOCUMENT_TYPE_NODE      => 10
298             XML_DOCUMENT_FRAG_NODE      => 11
299             XML_NOTATION_NODE           => 12
300             XML_HTML_DOCUMENT_NODE      => 13
301             XML_DTD_NODE                => 14
302             XML_ELEMENT_DECL            => 15
303             XML_ATTRIBUTE_DECL          => 16
304             XML_ENTITY_DECL             => 17
305             XML_NAMESPACE_DECL          => 18
306             XML_XINCLUDE_START          => 19
307             XML_XINCLUDE_END            => 20
308
309       ":encoding"
310           Exports two encoding conversion functions from XML::LibXML::Common.
311
312             encodeToUTF8()
313             decodeFromUTF8()
314
315       ":ns"
316           Exports two convenience constants: the implicit namespace of the
317           reserved "xml:" prefix, and the implicit namespace for the reserved
318           "xmlns:" prefix.
319
320             XML_XML_NS    => 'http://www.w3.org/XML/1998/namespace'
321             XML_XMLNS_NS  => 'http://www.w3.org/2000/xmlns/'
322
324       The modules described in this section are not part of the XML::LibXML
325       package itself. As they support some additional features, they are
326       mentioned here.
327
328       XML::LibXSLT
329           XSLT 1.0 Processor using libxslt and XML::LibXML
330
331       XML::LibXML::Iterator
332           XML::LibXML Implementation of the DOM Traversal Specification
333
334       XML::CompactTree::XS
335           Uses XML::LibXML::Reader to very efficiently to parse XML document
336           or element into native Perl data structures, which are less
337           flexible but significantly faster to process then DOM.
338

XML::LIBXML AND XML::GDOME

340       Note: THE FUNCTIONS DESCRIBED HERE ARE STILL EXPERIMENTAL
341
342       Although both modules make use of libxml2's XML capabilities, the DOM
343       implementation of both modules are not compatible. But still it is
344       possible to exchange nodes from one DOM to the other. The concept of
345       this exchange is pretty similar to the function cloneNode(): The
346       particular node is copied on the low-level to the opposite DOM
347       implementation.
348
349       Since the DOM implementations cannot coexist within one document, one
350       is forced to copy each node that should be used. Because you are always
351       keeping two nodes this may cause quite an impact on a machines memory
352       usage.
353
354       XML::LibXML provides two functions to export or import GDOME nodes:
355       import_GDOME() and export_GDOME(). Both function have two parameters:
356       the node and a flag for recursive import. The flag works as in
357       cloneNode().
358
359       The two functions allow one to export and import XML::GDOME nodes
360       explicitly, however, XML::LibXML also allows the transparent import of
361       XML::GDOME nodes in functions such as appendChild(), insertAfter() and
362       so on. While native nodes are automatically adopted in most functions
363       XML::GDOME nodes are always cloned in advance. Thus if the original
364       node is modified after the operation, the node in the XML::LibXML
365       document will not have this information.
366
367       import_GDOME
368             $libxmlnode = XML::LibXML->import_GDOME( $node, $deep );
369
370           This clones an XML::GDOME node to an XML::LibXML node explicitly.
371
372       export_GDOME
373             $gdomenode = XML::LibXML->export_GDOME( $node, $deep );
374
375           Allows one to clone an XML::LibXML node into an XML::GDOME node.
376

CONTACTS

378       For bug reports, please use the CPAN request tracker on
379       http://rt.cpan.org/NoAuth/Bugs.html?Dist=XML-LibXML
380
381       For suggestions etc., and other issues related to XML::LibXML you may
382       use the perl XML mailing list ("perl-xml@listserv.ActiveState.com"),
383       where most XML-related Perl modules are discussed. In case of problems
384       you should check the archives of that list first. Many problems are
385       already discussed there. You can find the list's archives and
386       subscription options at
387       <http://aspn.activestate.com/ASPN/Mail/Browse/Threaded/perl-xml>.
388

AUTHORS

390       Matt Sergeant, Christian Glahn, Petr Pajas
391

VERSION

393       2.0207
394
396       2001-2007, AxKit.com Ltd.
397
398       2002-2006, Christian Glahn.
399
400       2006-2009, Petr Pajas.
401

LICENSE

403       This program is free software; you can redistribute it and/or modify it
404       under the same terms as Perl itself.
405
406
407
408perl v5.34.0                      2021-07-23                         LibXML(3)
Impressum