1XML::LibXML::Document(3U)ser Contributed Perl DocumentatiXoMnL::LibXML::Document(3)
2
3
4

NAME

6       XML::LibXML::Document - XML::LibXML DOM Document Class
7

SYNOPSIS

9         use XML::LibXML;
10         # Only methods specific to Document nodes are listed here,
11         # see the XML::LibXML::Node manpage for other methods
12
13         $dom = XML::LibXML::Document->new( $version, $encoding );
14         $dom = XML::LibXML::Document->createDocument( $version, $encoding );
15         $strURI = $doc->URI();
16         $doc->setURI($strURI);
17         $strEncoding = $doc->encoding();
18         $strEncoding = $doc->actualEncoding();
19         $doc->setEncoding($new_encoding);
20         $strVersion = $doc->version();
21         $doc->standalone
22         $doc->setStandalone($numvalue);
23         my $compression = $doc->compression;
24         $doc->setCompression($ziplevel);
25         $docstring = $dom->toString($format);
26         $c14nstr = $doc->toStringC14N($comment_flag, $xpath [, $xpath_context ]);
27         $ec14nstr = $doc->toStringEC14N($comment_flag, $xpath [, $xpath_context ], $inclusive_prefix_list);
28         $str = $doc->serialize($format);
29         $state = $doc->toFile($filename, $format);
30         $state = $doc->toFH($fh, $format);
31         $str = $document->toStringHTML();
32         $str = $document->serialize_html();
33         $bool = $dom->is_valid();
34         $dom->validate();
35         $root = $dom->documentElement();
36         $dom->setDocumentElement( $root );
37         $element = $dom->createElement( $nodename );
38         $element = $dom->createElementNS( $namespaceURI, $nodename );
39         $text = $dom->createTextNode( $content_text );
40         $comment = $dom->createComment( $comment_text );
41         $attrnode = $doc->createAttribute($name [,$value]);
42         $attrnode = $doc->createAttributeNS( namespaceURI, $name [,$value] );
43         $fragment = $doc->createDocumentFragment();
44         $cdata = $dom->createCDATASection( $cdata_content );
45         my $pi = $doc->createProcessingInstruction( $target, $data );
46         my $entref = $doc->createEntityReference($refname);
47         $dtd = $document->createInternalSubset( $rootnode, $public, $system);
48         $dtd = $document->createExternalSubset( $rootnode_name, $publicId, $systemId);
49         $document->importNode( $node );
50         $document->adoptNode( $node );
51         my $dtd = $doc->externalSubset;
52         my $dtd = $doc->internalSubset;
53         $doc->setExternalSubset($dtd);
54         $doc->setInternalSubset($dtd);
55         my $dtd = $doc->removeExternalSubset();
56         my $dtd = $doc->removeInternalSubset();
57         my @nodelist = $doc->getElementsByTagName($tagname);
58         my @nodelist = $doc->getElementsByTagNameNS($nsURI,$tagname);
59         my @nodelist = $doc->getElementsByLocalName($localname);
60         my $node = $doc->getElementById($id);
61         $dom->indexElements();
62

DESCRIPTION

64       The Document Class is in most cases the result of a parsing process.
65       But sometimes it is necessary to create a Document from scratch. The
66       DOM Document Class provides functions that conform to the DOM Core
67       naming style.
68
69       It inherits all functions from XML::LibXML::Node as specified in the
70       DOM specification. This enables access to the nodes besides the root
71       element on document level - a "DTD" for example. The support for these
72       nodes is limited at the moment.
73
74       While generally nodes are bound to a document in the DOM concept it is
75       suggested that one should always create a node not bound to any
76       document. There is no need of really including the node to the
77       document, but once the node is bound to a document, it is quite safe
78       that all strings have the correct encoding. If an unbound text node
79       with an ISO encoded string is created (e.g.  with $CLASS->new()), the
80       "toString" function may not return the expected result.
81
82       To prevent such problems, it is recommended to pass all data to
83       XML::LibXML methods as character strings (i.e. UTF-8 encoded, with the
84       UTF8 flag on).
85

METHODS

87       Many functions listed here are extensively documented in the DOM Level
88       3 specification (<http://www.w3.org/TR/DOM-Level-3-Core/>). Please
89       refer to the specification for extensive documentation.
90
91       new
92             $dom = XML::LibXML::Document->new( $version, $encoding );
93
94           alias for createDocument()
95
96       createDocument
97             $dom = XML::LibXML::Document->createDocument( $version, $encoding );
98
99           The constructor for the document class. As Parameter it takes the
100           version string and (optionally) the encoding string. Simply calling
101           createDocument() will create the document:
102
103             <?xml version="your version" encoding="your encoding"?>
104
105           Both parameter are optional. The default value for $version is 1.0,
106           of course. If the $encoding parameter is not set, the encoding will
107           be left unset, which means UTF-8 is implied.
108
109           The call of createDocument() without any parameter will result the
110           following code:
111
112             <?xml version="1.0"?>
113
114           Alternatively one can call this constructor directly from the
115           XML::LibXML class level, to avoid some typing. This will not have
116           any effect on the class instance, which is always
117           XML::LibXML::Document.
118
119             my $document = XML::LibXML->createDocument( "1.0", "UTF-8" );
120
121           is therefore a shortcut for
122
123             my $document = XML::LibXML::Document->createDocument( "1.0", "UTF-8" );
124
125       URI
126             $strURI = $doc->URI();
127
128           Returns the URI (or filename) of the original document. For
129           documents obtained by parsing a string of a FH without using the
130           URI parsing argument of the corresponding "parse_*" function, the
131           result is a generated string unknown-XYZ where XYZ is some number;
132           for documents created with the constructor "new", the URI is
133           undefined.
134
135           The value can be modified by calling "setURI" method on the
136           document node.
137
138       setURI
139             $doc->setURI($strURI);
140
141           Sets the URI of the document reported by the method URI (see also
142           the URI argument to the various "parse_*" functions).
143
144       encoding
145             $strEncoding = $doc->encoding();
146
147           returns the encoding string of the document.
148
149             my $doc = XML::LibXML->createDocument( "1.0", "ISO-8859-15" );
150             print $doc->encoding; # prints ISO-8859-15
151
152       actualEncoding
153             $strEncoding = $doc->actualEncoding();
154
155           returns the encoding in which the XML will be returned by
156           $doc->toString().  This is usually the original encoding of the
157           document as declared in the XML declaration and returned by
158           $doc->encoding. If the original encoding is not known (e.g. if
159           created in memory or parsed from a XML without a declared
160           encoding), 'UTF-8' is returned.
161
162             my $doc = XML::LibXML->createDocument( "1.0", "ISO-8859-15" );
163             print $doc->encoding; # prints ISO-8859-15
164
165       setEncoding
166             $doc->setEncoding($new_encoding);
167
168           This method allows one to change the declaration of encoding in the
169           XML declaration of the document. The value also affects the
170           encoding in which the document is serialized to XML by
171           $doc->toString(). Use setEncoding() to remove the encoding
172           declaration.
173
174       version
175             $strVersion = $doc->version();
176
177           returns the version string of the document
178
179           getVersion() is an alternative form of this function.
180
181       standalone
182             $doc->standalone
183
184           This function returns the Numerical value of a documents XML
185           declarations standalone attribute. It returns 1 if standalone="yes"
186           was found, 0 if standalone="no" was found and -1 if standalone was
187           not specified (default on creation).
188
189       setStandalone
190             $doc->setStandalone($numvalue);
191
192           Through this method it is possible to alter the value of a
193           documents standalone attribute. Set it to 1 to set
194           standalone="yes", to 0 to set standalone="no" or set it to -1 to
195           remove the standalone attribute from the XML declaration.
196
197       compression
198             my $compression = $doc->compression;
199
200           libxml2 allows reading of documents directly from gzipped files. In
201           this case the compression variable is set to the compression level
202           of that file (0-8). If XML::LibXML parsed a different source or the
203           file wasn't compressed, the returned value will be -1.
204
205       setCompression
206             $doc->setCompression($ziplevel);
207
208           If one intends to write the document directly to a file, it is
209           possible to set the compression level for a given document. This
210           level can be in the range from 0 to 8. If XML::LibXML should not
211           try to compress use -1 (default).
212
213           Note that this feature will only work if libxml2 is compiled with
214           zlib support and toFile() is used for output.
215
216       toString
217             $docstring = $dom->toString($format);
218
219           toString is a DOM serializing function, so the DOM Tree is
220           serialized into an XML string, ready for output.
221
222           IMPORTANT: unlike toString for other nodes, on document nodes this
223           function returns the XML as a byte string in the original encoding
224           of the document (see the actualEncoding() method)! This means you
225           can simply do:
226
227             open my $out_fh, '>', $file;
228             print {$out_fh} $doc->toString;
229
230           regardless of the actual encoding of the document. See the section
231           on encodings in XML::LibXML for more details.
232
233           The optional $format parameter sets the indenting of the output.
234           This parameter is expected to be an "integer" value, that specifies
235           that indentation should be used. The format parameter can have
236           three different values if it is used:
237
238           If $format is 0, than the document is dumped as it was originally
239           parsed
240
241           If $format is 1, libxml2 will add ignorable white spaces, so the
242           nodes content is easier to read. Existing text nodes will not be
243           altered
244
245           If $format is 2 (or higher), libxml2 will act as $format == 1 but
246           it add a leading and a trailing line break to each text node.
247
248           libxml2 uses a hard-coded indentation of 2 space characters per
249           indentation level. This value can not be altered on run-time.
250
251       toStringC14N
252             $c14nstr = $doc->toStringC14N($comment_flag, $xpath [, $xpath_context ]);
253
254           See the documentation in XML::LibXML::Node.
255
256       toStringEC14N
257             $ec14nstr = $doc->toStringEC14N($comment_flag, $xpath [, $xpath_context ], $inclusive_prefix_list);
258
259           See the documentation in XML::LibXML::Node.
260
261       serialize
262             $str = $doc->serialize($format);
263
264           An alias for toString(). This function was name added to be more
265           consistent with libxml2.
266
267       serialize_c14n
268           An alias for toStringC14N().
269
270       serialize_exc_c14n
271           An alias for toStringEC14N().
272
273       toFile
274             $state = $doc->toFile($filename, $format);
275
276           This function is similar to toString(), but it writes the document
277           directly into a filesystem. This function is very useful, if one
278           needs to store large documents.
279
280           The format parameter has the same behaviour as in toString().
281
282       toFH
283             $state = $doc->toFH($fh, $format);
284
285           This function is similar to toString(), but it writes the document
286           directly to a filehandle or a stream. A byte stream in the document
287           encoding is passed to the file handle. Do NOT apply any
288           :encoding(...) or ":utf8" PerlIO layer to the filehandle! See the
289           section on encodings in XML::LibXML for more details.
290
291           The format parameter has the same behaviour as in toString().
292
293       toStringHTML
294             $str = $document->toStringHTML();
295
296           toStringHTML serialize the tree to a byte string in the document
297           encoding as HTML. With this method indenting is automatic and
298           managed by libxml2 internally. Note the string must contain <meta
299           http-equiv="Content-Type" content="text/html; charset=utf-8">
300           (rather than the newer <meta charset="utf-8">), else all non-ASCII
301           will become entities.
302
303       serialize_html
304             $str = $document->serialize_html();
305
306           An alias for toStringHTML().
307
308       is_valid
309             $bool = $dom->is_valid();
310
311           Returns either TRUE or FALSE depending on whether the DOM Tree is a
312           valid Document or not.
313
314           You may also pass in a XML::LibXML::Dtd object, to validate against
315           an external DTD:
316
317             if (!$dom->is_valid($dtd)) {
318                  warn("document is not valid!");
319              }
320
321       validate
322             $dom->validate();
323
324           This is an exception throwing equivalent of is_valid. If the
325           document is not valid it will throw an exception containing the
326           error. This allows you much better error reporting than simply
327           is_valid or not.
328
329           Again, you may pass in a DTD object
330
331       documentElement
332             $root = $dom->documentElement();
333
334           Returns the root element of the Document. A document can have just
335           one root element to contain the documents data.
336
337           Optionally one can use getDocumentElement.
338
339       setDocumentElement
340             $dom->setDocumentElement( $root );
341
342           This function enables you to set the root element for a document.
343           The function supports the import of a node from a different
344           document tree, but does not support a document fragment as $root.
345
346       createElement
347             $element = $dom->createElement( $nodename );
348
349           This function creates a new Element Node bound to the DOM with the
350           name $nodename.
351
352       createElementNS
353             $element = $dom->createElementNS( $namespaceURI, $nodename );
354
355           This function creates a new Element Node bound to the DOM with the
356           name $nodename and placed in the given namespace.
357
358       createTextNode
359             $text = $dom->createTextNode( $content_text );
360
361           As an equivalent of createElement, but it creates a Text Node bound
362           to the DOM.
363
364       createComment
365             $comment = $dom->createComment( $comment_text );
366
367           As an equivalent of createElement, but it creates a Comment Node
368           bound to the DOM.
369
370       createAttribute
371             $attrnode = $doc->createAttribute($name [,$value]);
372
373           Creates a new Attribute node.
374
375       createAttributeNS
376             $attrnode = $doc->createAttributeNS( namespaceURI, $name [,$value] );
377
378           Creates an Attribute bound to a namespace.
379
380       createDocumentFragment
381             $fragment = $doc->createDocumentFragment();
382
383           This function creates a DocumentFragment.
384
385       createCDATASection
386             $cdata = $dom->createCDATASection( $cdata_content );
387
388           Similar to createTextNode and createComment, this function creates
389           a CDataSection bound to the current DOM.
390
391       createProcessingInstruction
392             my $pi = $doc->createProcessingInstruction( $target, $data );
393
394           create a processing instruction node.
395
396           Since this method is quite long one may use its short form
397           createPI().
398
399       createEntityReference
400             my $entref = $doc->createEntityReference($refname);
401
402           If a document has a DTD specified, one can create entity references
403           by using this function. If one wants to add a entity reference to
404           the document, this reference has to be created by this function.
405
406           An entity reference is unique to a document and cannot be passed to
407           other documents as other nodes can be passed.
408
409           NOTE: A text content containing something that looks like an entity
410           reference, will not be expanded to a real entity reference unless
411           it is a predefined entity
412
413             my $string = "&foo;";
414              $some_element->appendText( $string );
415              print $some_element->textContent; # prints "&amp;foo;"
416
417       createInternalSubset
418             $dtd = $document->createInternalSubset( $rootnode, $public, $system);
419
420           This function creates and adds an internal subset to the given
421           document.  Because the function automatically adds the DTD to the
422           document there is no need to add the created node explicitly to the
423           document.
424
425             my $document = XML::LibXML::Document->new();
426              my $dtd      = $document->createInternalSubset( "foo", undef, "foo.dtd" );
427
428           will result in the following XML document:
429
430             <?xml version="1.0"?>
431              <!DOCTYPE foo SYSTEM "foo.dtd">
432
433           By setting the public parameter it is possible to set PUBLIC DTDs
434           to a given document. So
435
436             my $document = XML::LibXML::Document->new();
437             my $dtd      = $document->createInternalSubset( "foo", "-//FOO//DTD FOO 0.1//EN", undef );
438
439           will cause the following declaration to be created on the document:
440
441             <?xml version="1.0"?>
442             <!DOCTYPE foo PUBLIC "-//FOO//DTD FOO 0.1//EN">
443
444       createExternalSubset
445             $dtd = $document->createExternalSubset( $rootnode_name, $publicId, $systemId);
446
447           This function is similar to createInternalSubset() but this DTD is
448           considered to be external and is therefore not added to the
449           document itself. Nevertheless it can be used for validation
450           purposes.
451
452       importNode
453             $document->importNode( $node );
454
455           If a node is not part of a document, it can be imported to another
456           document. As specified in DOM Level 2 Specification the Node will
457           not be altered or removed from its original document
458           ("$node->cloneNode(1)" will get called implicitly).
459
460           NOTE: Don't try to use importNode() to import sub-trees that
461           contain an entity reference - even if the entity reference is the
462           root node of the sub-tree. This will cause serious problems to your
463           program. This is a limitation of libxml2 and not of XML::LibXML
464           itself.
465
466       adoptNode
467             $document->adoptNode( $node );
468
469           If a node is not part of a document, it can be imported to another
470           document. As specified in DOM Level 3 Specification the Node will
471           not be altered but it will removed from its original document.
472
473           After a document adopted a node, the node, its attributes and all
474           its descendants belong to the new document. Because the node does
475           not belong to the old document, it will be unlinked from its old
476           location first.
477
478           NOTE: Don't try to adoptNode() to import sub-trees that contain
479           entity references - even if the entity reference is the root node
480           of the sub-tree. This will cause serious problems to your program.
481           This is a limitation of libxml2 and not of XML::LibXML itself.
482
483       externalSubset
484             my $dtd = $doc->externalSubset;
485
486           If a document has an external subset defined it will be returned by
487           this function.
488
489           NOTE Dtd nodes are no ordinary nodes in libxml2. The support for
490           these nodes in XML::LibXML is still limited. In particular one may
491           not want use common node function on doctype declaration nodes!
492
493       internalSubset
494             my $dtd = $doc->internalSubset;
495
496           If a document has an internal subset defined it will be returned by
497           this function.
498
499           NOTE Dtd nodes are no ordinary nodes in libxml2. The support for
500           these nodes in XML::LibXML is still limited. In particular one may
501           not want use common node function on doctype declaration nodes!
502
503       setExternalSubset
504             $doc->setExternalSubset($dtd);
505
506           EXPERIMENTAL!
507
508           This method sets a DTD node as an external subset of the given
509           document.
510
511       setInternalSubset
512             $doc->setInternalSubset($dtd);
513
514           EXPERIMENTAL!
515
516           This method sets a DTD node as an internal subset of the given
517           document.
518
519       removeExternalSubset
520             my $dtd = $doc->removeExternalSubset();
521
522           EXPERIMENTAL!
523
524           If a document has an external subset defined it can be removed from
525           the document by using this function. The removed dtd node will be
526           returned.
527
528       removeInternalSubset
529             my $dtd = $doc->removeInternalSubset();
530
531           EXPERIMENTAL!
532
533           If a document has an internal subset defined it can be removed from
534           the document by using this function. The removed dtd node will be
535           returned.
536
537       getElementsByTagName
538             my @nodelist = $doc->getElementsByTagName($tagname);
539
540           Implements the DOM Level 2 function
541
542           In SCALAR context this function returns an XML::LibXML::NodeList
543           object.
544
545       getElementsByTagNameNS
546             my @nodelist = $doc->getElementsByTagNameNS($nsURI,$tagname);
547
548           Implements the DOM Level 2 function
549
550           In SCALAR context this function returns an XML::LibXML::NodeList
551           object.
552
553       getElementsByLocalName
554             my @nodelist = $doc->getElementsByLocalName($localname);
555
556           This allows the fetching of all nodes from a given document with
557           the given Localname.
558
559           In SCALAR context this function returns an XML::LibXML::NodeList
560           object.
561
562       getElementById
563             my $node = $doc->getElementById($id);
564
565           Returns the element that has an ID attribute with the given value.
566           If no such element exists, this returns undef.
567
568           Note: the ID of an element may change while manipulating the
569           document. For documents with a DTD, the information about ID
570           attributes is only available if DTD loading/validation has been
571           requested. For HTML documents parsed with the HTML parser ID
572           detection is done automatically. In XML documents, all "xml:id"
573           attributes are considered to be of type ID. You can test ID-ness of
574           an attribute node with $attr->isId().
575
576           In versions 1.59 and earlier this method was called
577           getElementsById() (plural) by mistake. Starting from 1.60 this name
578           is maintained as an alias only for backward compatibility.
579
580       indexElements
581             $dom->indexElements();
582
583           This function causes libxml2 to stamp all elements in a document
584           with their document position index which considerably speeds up
585           XPath queries for large documents. It should only be used with
586           static documents that won't be further changed by any DOM methods,
587           because once a document is indexed, XPath will always prefer the
588           index to other methods of determining the document order of nodes.
589           XPath could therefore return improperly ordered node-lists when
590           applied on a document that has been changed after being indexed. It
591           is of course possible to use this method to re-index a modified
592           document before using it with XPath again. This function is not a
593           part of the DOM specification.
594
595           This function returns number of elements indexed, -1 if error
596           occurred, or -2 if this feature is not available in the running
597           libxml2.
598

AUTHORS

600       Matt Sergeant, Christian Glahn, Petr Pajas
601

VERSION

603       2.0208
604
606       2001-2007, AxKit.com Ltd.
607
608       2002-2006, Christian Glahn.
609
610       2006-2009, Petr Pajas.
611

LICENSE

613       This program is free software; you can redistribute it and/or modify it
614       under the same terms as Perl itself.
615
616
617
618perl v5.36.0                      2023-01-20          XML::LibXML::Document(3)
Impressum