1XML::LibXML::Document(3U)ser Contributed Perl DocumentatiXoMnL::LibXML::Document(3)
2
3
4
6 XML::LibXML::Document - XML::LibXML DOM Document Class
7
9 use XML::LibXML;
10 # Only methods specific to Document nodes are listed here,
11 # see the XML::LibXML::Node manpage for other methods
12
13 $dom = XML::LibXML::Document->new( $version, $encoding );
14 $dom = XML::LibXML::Document->createDocument( $version, $encoding );
15 $strURI = $doc->URI();
16 $doc->setURI($strURI);
17 $strEncoding = $doc->encoding();
18 $strEncoding = $doc->actualEncoding();
19 $doc->setEncoding($new_encoding);
20 $strVersion = $doc->version();
21 $doc->standalone
22 $doc->setStandalone($numvalue);
23 my $compression = $doc->compression;
24 $doc->setCompression($ziplevel);
25 $docstring = $dom->toString($format);
26 $c14nstr = $doc->toStringC14N($comment_flag, $xpath [, $xpath_context ]);
27 $ec14nstr = $doc->toStringEC14N($comment_flag, $xpath [, $xpath_context ], $inclusive_prefix_list);
28 $str = $doc->serialize($format);
29 $state = $doc->toFile($filename, $format);
30 $state = $doc->toFH($fh, $format);
31 $str = $document->toStringHTML();
32 $str = $document->serialize_html();
33 $bool = $dom->is_valid();
34 $dom->validate();
35 $root = $dom->documentElement();
36 $dom->setDocumentElement( $root );
37 $element = $dom->createElement( $nodename );
38 $element = $dom->createElementNS( $namespaceURI, $nodename );
39 $text = $dom->createTextNode( $content_text );
40 $comment = $dom->createComment( $comment_text );
41 $attrnode = $doc->createAttribute($name [,$value]);
42 $attrnode = $doc->createAttributeNS( namespaceURI, $name [,$value] );
43 $fragment = $doc->createDocumentFragment();
44 $cdata = $dom->createCDATASection( $cdata_content );
45 my $pi = $doc->createProcessingInstruction( $target, $data );
46 my $entref = $doc->createEntityReference($refname);
47 $dtd = $document->createInternalSubset( $rootnode, $public, $system);
48 $dtd = $document->createExternalSubset( $rootnode_name, $publicId, $systemId);
49 $document->importNode( $node );
50 $document->adoptNode( $node );
51 my $dtd = $doc->externalSubset;
52 my $dtd = $doc->internalSubset;
53 $doc->setExternalSubset($dtd);
54 $doc->setInternalSubset($dtd);
55 my $dtd = $doc->removeExternalSubset();
56 my $dtd = $doc->removeInternalSubset();
57 my @nodelist = $doc->getElementsByTagName($tagname);
58 my @nodelist = $doc->getElementsByTagNameNS($nsURI,$tagname);
59 my @nodelist = $doc->getElementsByLocalName($localname);
60 my $node = $doc->getElementById($id);
61 $dom->indexElements();
62
64 The Document Class is in most cases the result of a parsing process.
65 But sometimes it is necessary to create a Document from scratch. The
66 DOM Document Class provides functions that conform to the DOM Core
67 naming style.
68
69 It inherits all functions from XML::LibXML::Node as specified in the
70 DOM specification. This enables access to the nodes besides the root
71 element on document level - a "DTD" for example. The support for these
72 nodes is limited at the moment.
73
74 While generally nodes are bound to a document in the DOM concept it is
75 suggested that one should always create a node not bound to any
76 document. There is no need of really including the node to the
77 document, but once the node is bound to a document, it is quite safe
78 that all strings have the correct encoding. If an unbound text node
79 with an ISO encoded string is created (e.g. with $CLASS->new()), the
80 "toString" function may not return the expected result.
81
82 To prevent such problems, it is recommended to pass all data to
83 XML::LibXML methods as character strings (i.e. UTF-8 encoded, with the
84 UTF8 flag on).
85
87 Many functions listed here are extensively documented in the DOM Level
88 3 specification (<http://www.w3.org/TR/DOM-Level-3-Core/>). Please
89 refer to the specification for extensive documentation.
90
91 new
92 $dom = XML::LibXML::Document->new( $version, $encoding );
93
94 alias for createDocument()
95
96 createDocument
97 $dom = XML::LibXML::Document->createDocument( $version, $encoding );
98
99 The constructor for the document class. As Parameter it takes the
100 version string and (optionally) the encoding string. Simply calling
101 createDocument() will create the document:
102
103 <?xml version="your version" encoding="your encoding"?>
104
105 Both parameter are optional. The default value for $version is 1.0,
106 of course. If the $encoding parameter is not set, the encoding will
107 be left unset, which means UTF-8 is implied.
108
109 The call of createDocument() without any parameter will result the
110 following code:
111
112 <?xml version="1.0"?>
113
114 Alternatively one can call this constructor directly from the
115 XML::LibXML class level, to avoid some typing. This will not have
116 any effect on the class instance, which is always
117 XML::LibXML::Document.
118
119 my $document = XML::LibXML->createDocument( "1.0", "UTF-8" );
120
121 is therefore a shortcut for
122
123 my $document = XML::LibXML::Document->createDocument( "1.0", "UTF-8" );
124
125 URI
126 $strURI = $doc->URI();
127
128 Returns the URI (or filename) of the original document. For
129 documents obtained by parsing a string of a FH without using the
130 URI parsing argument of the corresponding "parse_*" function, the
131 result is a generated string unknown-XYZ where XYZ is some number;
132 for documents created with the constructor "new", the URI is
133 undefined.
134
135 The value can be modified by calling "setURI" method on the
136 document node.
137
138 setURI
139 $doc->setURI($strURI);
140
141 Sets the URI of the document reported by the method URI (see also
142 the URI argument to the various "parse_*" functions).
143
144 encoding
145 $strEncoding = $doc->encoding();
146
147 returns the encoding string of the document.
148
149 my $doc = XML::LibXML->createDocument( "1.0", "ISO-8859-15" );
150 print $doc->encoding; # prints ISO-8859-15
151
152 actualEncoding
153 $strEncoding = $doc->actualEncoding();
154
155 returns the encoding in which the XML will be returned by
156 $doc->toString(). This is usually the original encoding of the
157 document as declared in the XML declaration and returned by
158 $doc->encoding. If the original encoding is not known (e.g. if
159 created in memory or parsed from a XML without a declared
160 encoding), 'UTF-8' is returned.
161
162 my $doc = XML::LibXML->createDocument( "1.0", "ISO-8859-15" );
163 print $doc->encoding; # prints ISO-8859-15
164
165 setEncoding
166 $doc->setEncoding($new_encoding);
167
168 This method allows one to change the declaration of encoding in the
169 XML declaration of the document. The value also affects the
170 encoding in which the document is serialized to XML by
171 $doc->toString(). Use setEncoding() to remove the encoding
172 declaration.
173
174 version
175 $strVersion = $doc->version();
176
177 returns the version string of the document
178
179 getVersion() is an alternative form of this function.
180
181 standalone
182 $doc->standalone
183
184 This function returns the Numerical value of a documents XML
185 declarations standalone attribute. It returns 1 if standalone="yes"
186 was found, 0 if standalone="no" was found and -1 if standalone was
187 not specified (default on creation).
188
189 setStandalone
190 $doc->setStandalone($numvalue);
191
192 Through this method it is possible to alter the value of a
193 documents standalone attribute. Set it to 1 to set
194 standalone="yes", to 0 to set standalone="no" or set it to -1 to
195 remove the standalone attribute from the XML declaration.
196
197 compression
198 my $compression = $doc->compression;
199
200 libxml2 allows reading of documents directly from gzipped files. In
201 this case the compression variable is set to the compression level
202 of that file (0-8). If XML::LibXML parsed a different source or the
203 file wasn't compressed, the returned value will be -1.
204
205 setCompression
206 $doc->setCompression($ziplevel);
207
208 If one intends to write the document directly to a file, it is
209 possible to set the compression level for a given document. This
210 level can be in the range from 0 to 8. If XML::LibXML should not
211 try to compress use -1 (default).
212
213 Note that this feature will only work if libxml2 is compiled with
214 zlib support and toFile() is used for output.
215
216 toString
217 $docstring = $dom->toString($format);
218
219 toString is a DOM serializing function, so the DOM Tree is
220 serialized into an XML string, ready for output.
221
222 IMPORTANT: unlike toString for other nodes, on document nodes this
223 function returns the XML as a byte string in the original encoding
224 of the document (see the actualEncoding() method)! This means you
225 can simply do:
226
227 open my $out_fh, '>', $file;
228 print {$out_fh} $doc->toString;
229
230 regardless of the actual encoding of the document. See the section
231 on encodings in XML::LibXML for more details.
232
233 The optional $format parameter sets the indenting of the output.
234 This parameter is expected to be an "integer" value, that specifies
235 that indentation should be used. The format parameter can have
236 three different values if it is used:
237
238 If $format is 0, than the document is dumped as it was originally
239 parsed
240
241 If $format is 1, libxml2 will add ignorable white spaces, so the
242 nodes content is easier to read. Existing text nodes will not be
243 altered
244
245 If $format is 2 (or higher), libxml2 will act as $format == 1 but
246 it add a leading and a trailing line break to each text node.
247
248 libxml2 uses a hard-coded indentation of 2 space characters per
249 indentation level. This value can not be altered on run-time.
250
251 toStringC14N
252 $c14nstr = $doc->toStringC14N($comment_flag, $xpath [, $xpath_context ]);
253
254 See the documentation in XML::LibXML::Node.
255
256 toStringEC14N
257 $ec14nstr = $doc->toStringEC14N($comment_flag, $xpath [, $xpath_context ], $inclusive_prefix_list);
258
259 See the documentation in XML::LibXML::Node.
260
261 serialize
262 $str = $doc->serialize($format);
263
264 An alias for toString(). This function was name added to be more
265 consistent with libxml2.
266
267 serialize_c14n
268 An alias for toStringC14N().
269
270 serialize_exc_c14n
271 An alias for toStringEC14N().
272
273 toFile
274 $state = $doc->toFile($filename, $format);
275
276 This function is similar to toString(), but it writes the document
277 directly into a filesystem. This function is very useful, if one
278 needs to store large documents.
279
280 The format parameter has the same behaviour as in toString().
281
282 toFH
283 $state = $doc->toFH($fh, $format);
284
285 This function is similar to toString(), but it writes the document
286 directly to a filehandle or a stream. A byte stream in the document
287 encoding is passed to the file handle. Do NOT apply any
288 ":encoding(...)" or ":utf8" PerlIO layer to the filehandle! See the
289 section on encodings in XML::LibXML for more details.
290
291 The format parameter has the same behaviour as in toString().
292
293 toStringHTML
294 $str = $document->toStringHTML();
295
296 toStringHTML serialize the tree to a byte string in the document
297 encoding as HTML. With this method indenting is automatic and
298 managed by libxml2 internally. Note the string must contain <meta
299 http-equiv="Content-Type" content="text/html; charset=utf-8">
300 (rather than the newer <meta charset="utf-8">), else all non-ASCII
301 will become entities.
302
303 serialize_html
304 $str = $document->serialize_html();
305
306 An alias for toStringHTML().
307
308 is_valid
309 $bool = $dom->is_valid();
310
311 Returns either TRUE or FALSE depending on whether the DOM Tree is a
312 valid Document or not.
313
314 You may also pass in a XML::LibXML::Dtd object, to validate against
315 an external DTD:
316
317 if (!$dom->is_valid($dtd)) {
318 warn("document is not valid!");
319 }
320
321 validate
322 $dom->validate();
323
324 This is an exception throwing equivalent of is_valid. If the
325 document is not valid it will throw an exception containing the
326 error. This allows you much better error reporting than simply
327 is_valid or not.
328
329 Again, you may pass in a DTD object
330
331 documentElement
332 $root = $dom->documentElement();
333
334 Returns the root element of the Document. A document can have just
335 one root element to contain the documents data.
336
337 Optionally one can use getDocumentElement.
338
339 setDocumentElement
340 $dom->setDocumentElement( $root );
341
342 This function enables you to set the root element for a document.
343 The function supports the import of a node from a different
344 document tree, but does not support a document fragment as $root.
345
346 createElement
347 $element = $dom->createElement( $nodename );
348
349 This function creates a new Element Node bound to the DOM with the
350 name $nodename.
351
352 createElementNS
353 $element = $dom->createElementNS( $namespaceURI, $nodename );
354
355 This function creates a new Element Node bound to the DOM with the
356 name $nodename and placed in the given namespace.
357
358 createTextNode
359 $text = $dom->createTextNode( $content_text );
360
361 As an equivalent of createElement, but it creates a Text Node bound
362 to the DOM.
363
364 createComment
365 $comment = $dom->createComment( $comment_text );
366
367 As an equivalent of createElement, but it creates a Comment Node
368 bound to the DOM.
369
370 createAttribute
371 $attrnode = $doc->createAttribute($name [,$value]);
372
373 Creates a new Attribute node.
374
375 createAttributeNS
376 $attrnode = $doc->createAttributeNS( namespaceURI, $name [,$value] );
377
378 Creates an Attribute bound to a namespace.
379
380 createDocumentFragment
381 $fragment = $doc->createDocumentFragment();
382
383 This function creates a DocumentFragment.
384
385 createCDATASection
386 $cdata = $dom->createCDATASection( $cdata_content );
387
388 Similar to createTextNode and createComment, this function creates
389 a CDataSection bound to the current DOM.
390
391 createProcessingInstruction
392 my $pi = $doc->createProcessingInstruction( $target, $data );
393
394 create a processing instruction node.
395
396 Since this method is quite long one may use its short form
397 createPI().
398
399 createEntityReference
400 my $entref = $doc->createEntityReference($refname);
401
402 If a document has a DTD specified, one can create entity references
403 by using this function. If one wants to add a entity reference to
404 the document, this reference has to be created by this function.
405
406 An entity reference is unique to a document and cannot be passed to
407 other documents as other nodes can be passed.
408
409 NOTE: A text content containing something that looks like an entity
410 reference, will not be expanded to a real entity reference unless
411 it is a predefined entity
412
413 my $string = "&foo;";
414 $some_element->appendText( $string );
415 print $some_element->textContent; # prints "&foo;"
416
417 createInternalSubset
418 $dtd = $document->createInternalSubset( $rootnode, $public, $system);
419
420 This function creates and adds an internal subset to the given
421 document. Because the function automatically adds the DTD to the
422 document there is no need to add the created node explicitly to the
423 document.
424
425 my $document = XML::LibXML::Document->new();
426 my $dtd = $document->createInternalSubset( "foo", undef, "foo.dtd" );
427
428 will result in the following XML document:
429
430 <?xml version="1.0"?>
431 <!DOCTYPE foo SYSTEM "foo.dtd">
432
433 By setting the public parameter it is possible to set PUBLIC DTDs
434 to a given document. So
435
436 my $document = XML::LibXML::Document->new();
437 my $dtd = $document->createInternalSubset( "foo", "-//FOO//DTD FOO 0.1//EN", undef );
438
439 will cause the following declaration to be created on the document:
440
441 <?xml version="1.0"?>
442 <!DOCTYPE foo PUBLIC "-//FOO//DTD FOO 0.1//EN">
443
444 createExternalSubset
445 $dtd = $document->createExternalSubset( $rootnode_name, $publicId, $systemId);
446
447 This function is similar to "createInternalSubset()" but this DTD
448 is considered to be external and is therefore not added to the
449 document itself. Nevertheless it can be used for validation
450 purposes.
451
452 importNode
453 $document->importNode( $node );
454
455 If a node is not part of a document, it can be imported to another
456 document. As specified in DOM Level 2 Specification the Node will
457 not be altered or removed from its original document
458 ("$node->cloneNode(1)" will get called implicitly).
459
460 NOTE: Don't try to use importNode() to import sub-trees that
461 contain an entity reference - even if the entity reference is the
462 root node of the sub-tree. This will cause serious problems to your
463 program. This is a limitation of libxml2 and not of XML::LibXML
464 itself.
465
466 adoptNode
467 $document->adoptNode( $node );
468
469 If a node is not part of a document, it can be imported to another
470 document. As specified in DOM Level 3 Specification the Node will
471 not be altered but it will removed from its original document.
472
473 After a document adopted a node, the node, its attributes and all
474 its descendants belong to the new document. Because the node does
475 not belong to the old document, it will be unlinked from its old
476 location first.
477
478 NOTE: Don't try to adoptNode() to import sub-trees that contain
479 entity references - even if the entity reference is the root node
480 of the sub-tree. This will cause serious problems to your program.
481 This is a limitation of libxml2 and not of XML::LibXML itself.
482
483 externalSubset
484 my $dtd = $doc->externalSubset;
485
486 If a document has an external subset defined it will be returned by
487 this function.
488
489 NOTE Dtd nodes are no ordinary nodes in libxml2. The support for
490 these nodes in XML::LibXML is still limited. In particular one may
491 not want use common node function on doctype declaration nodes!
492
493 internalSubset
494 my $dtd = $doc->internalSubset;
495
496 If a document has an internal subset defined it will be returned by
497 this function.
498
499 NOTE Dtd nodes are no ordinary nodes in libxml2. The support for
500 these nodes in XML::LibXML is still limited. In particular one may
501 not want use common node function on doctype declaration nodes!
502
503 setExternalSubset
504 $doc->setExternalSubset($dtd);
505
506 EXPERIMENTAL!
507
508 This method sets a DTD node as an external subset of the given
509 document.
510
511 setInternalSubset
512 $doc->setInternalSubset($dtd);
513
514 EXPERIMENTAL!
515
516 This method sets a DTD node as an internal subset of the given
517 document.
518
519 removeExternalSubset
520 my $dtd = $doc->removeExternalSubset();
521
522 EXPERIMENTAL!
523
524 If a document has an external subset defined it can be removed from
525 the document by using this function. The removed dtd node will be
526 returned.
527
528 removeInternalSubset
529 my $dtd = $doc->removeInternalSubset();
530
531 EXPERIMENTAL!
532
533 If a document has an internal subset defined it can be removed from
534 the document by using this function. The removed dtd node will be
535 returned.
536
537 getElementsByTagName
538 my @nodelist = $doc->getElementsByTagName($tagname);
539
540 Implements the DOM Level 2 function
541
542 In SCALAR context this function returns an XML::LibXML::NodeList
543 object.
544
545 getElementsByTagNameNS
546 my @nodelist = $doc->getElementsByTagNameNS($nsURI,$tagname);
547
548 Implements the DOM Level 2 function
549
550 In SCALAR context this function returns an XML::LibXML::NodeList
551 object.
552
553 getElementsByLocalName
554 my @nodelist = $doc->getElementsByLocalName($localname);
555
556 This allows the fetching of all nodes from a given document with
557 the given Localname.
558
559 In SCALAR context this function returns an XML::LibXML::NodeList
560 object.
561
562 getElementById
563 my $node = $doc->getElementById($id);
564
565 Returns the element that has an ID attribute with the given value.
566 If no such element exists, this returns undef.
567
568 Note: the ID of an element may change while manipulating the
569 document. For documents with a DTD, the information about ID
570 attributes is only available if DTD loading/validation has been
571 requested. For HTML documents parsed with the HTML parser ID
572 detection is done automatically. In XML documents, all "xml:id"
573 attributes are considered to be of type ID. You can test ID-ness of
574 an attribute node with $attr->isId().
575
576 In versions 1.59 and earlier this method was called
577 getElementsById() (plural) by mistake. Starting from 1.60 this name
578 is maintained as an alias only for backward compatibility.
579
580 indexElements
581 $dom->indexElements();
582
583 This function causes libxml2 to stamp all elements in a document
584 with their document position index which considerably speeds up
585 XPath queries for large documents. It should only be used with
586 static documents that won't be further changed by any DOM methods,
587 because once a document is indexed, XPath will always prefer the
588 index to other methods of determining the document order of nodes.
589 XPath could therefore return improperly ordered node-lists when
590 applied on a document that has been changed after being indexed. It
591 is of course possible to use this method to re-index a modified
592 document before using it with XPath again. This function is not a
593 part of the DOM specification.
594
595 This function returns number of elements indexed, -1 if error
596 occurred, or -2 if this feature is not available in the running
597 libxml2.
598
600 Matt Sergeant, Christian Glahn, Petr Pajas
601
603 2.0208
604
606 2001-2007, AxKit.com Ltd.
607
608 2002-2006, Christian Glahn.
609
610 2006-2009, Petr Pajas.
611
613 This program is free software; you can redistribute it and/or modify it
614 under the same terms as Perl itself.
615
616
617
618perl v5.36.0 2022-09-30 XML::LibXML::Document(3)