1XML::DOM(3)           User Contributed Perl Documentation          XML::DOM(3)
2
3
4

NAME

6       XML::DOM - A perl module for building DOM Level 1 compliant document
7       structures
8

SYNOPSIS

10        use XML::DOM;
11
12        my $parser = new XML::DOM::Parser;
13        my $doc = $parser->parsefile ("file.xml");
14
15        # print all HREF attributes of all CODEBASE elements
16        my $nodes = $doc->getElementsByTagName ("CODEBASE");
17        my $n = $nodes->getLength;
18
19        for (my $i = 0; $i < $n; $i++)
20        {
21            my $node = $nodes->item ($i);
22            my $href = $node->getAttributeNode ("HREF");
23            print $href->getValue . "\n";
24        }
25
26        # Print doc file
27        $doc->printToFile ("out.xml");
28
29        # Print to string
30        print $doc->toString;
31
32        # Avoid memory leaks - cleanup circular references for garbage collection
33        $doc->dispose;
34

DESCRIPTION

36       This module extends the XML::Parser module by Clark Cooper.  The
37       XML::Parser module is built on top of XML::Parser::Expat, which is a
38       lower level interface to James Clark's expat library.
39
40       XML::DOM::Parser is derived from XML::Parser. It parses XML strings or
41       files and builds a data structure that conforms to the API of the
42       Document Object Model as described at
43       http://www.w3.org/TR/REC-DOM-Level-1.  See the XML::Parser manpage for
44       other available features of the XML::DOM::Parser class.  Note that the
45       'Style' property should not be used (it is set internally.)
46
47       The XML::Parser NoExpand option is more or less supported, in that it
48       will generate EntityReference objects whenever an entity reference is
49       encountered in character data. I'm not sure how useful this is. Any
50       comments are welcome.
51
52       As described in the synopsis, when you create an XML::DOM::Parser
53       object, the parse and parsefile methods create an XML::DOM::Document
54       object from the specified input. This Document object can then be
55       examined, modified and written back out to a file or converted to a
56       string.
57
58       When using XML::DOM with XML::Parser version 2.19 and up, setting the
59       XML::DOM::Parser option KeepCDATA to 1 will store CDATASections in
60       CDATASection nodes, instead of converting them to Text nodes.
61       Subsequent CDATASection nodes will be merged into one. Let me know if
62       this is a problem.
63
64       When using XML::Parser 2.27 and above, you can suppress expansion of
65       parameter entity references (e.g. %pent;) in the DTD, by setting
66       ParseParamEnt to 1 and ExpandParamEnt to 0. See Hidden Nodes for
67       details.
68
69       A Document has a tree structure consisting of Node objects. A Node may
70       contain other nodes, depending on its type.  A Document may have
71       Element, Text, Comment, and CDATASection nodes.  Element nodes may have
72       Attr, Element, Text, Comment, and CDATASection nodes.  The other nodes
73       may not have any child nodes.
74
75       This module adds several node types that are not part of the DOM spec
76       (yet.)  These are: ElementDecl (for <!ELEMENT ...> declarations),
77       AttlistDecl (for <!ATTLIST ...> declarations), XMLDecl (for <?xml ...?>
78       declarations) and AttDef (for attribute definitions in an AttlistDecl.)
79

XML::DOM Classes

81       The XML::DOM module stores XML documents in a tree structure with a
82       root node of type XML::DOM::Document. Different nodes in tree represent
83       different parts of the XML file. The DOM Level 1 Specification defines
84       the following node types:
85
86       ·   XML::DOM::Node - Super class of all node types
87
88       ·   XML::DOM::Document - The root of the XML document
89
90       ·   XML::DOM::DocumentType - Describes the document structure:
91           <!DOCTYPE root [ ... ]>
92
93       ·   XML::DOM::Element - An XML element: <elem attr="val"> ... </elem>
94
95       ·   XML::DOM::Attr - An XML element attribute: name="value"
96
97       ·   XML::DOM::CharacterData - Super class of Text, Comment and
98           CDATASection
99
100       ·   XML::DOM::Text - Text in an XML element
101
102       ·   XML::DOM::CDATASection - Escaped block of text: <![CDATA[ text ]]>
103
104       ·   XML::DOM::Comment - An XML comment: <!-- comment -->
105
106       ·   XML::DOM::EntityReference - Refers to an ENTITY: &ent; or %ent;
107
108       ·   XML::DOM::Entity - An ENTITY definition: <!ENTITY ...>
109
110       ·   XML::DOM::ProcessingInstruction - <?PI target>
111
112       ·   XML::DOM::DocumentFragment - Lightweight node for cut & paste
113
114       ·   XML::DOM::Notation - An NOTATION definition: <!NOTATION ...>
115
116       In addition, the XML::DOM module contains the following nodes that are
117       not part of the DOM Level 1 Specification:
118
119       ·   XML::DOM::ElementDecl - Defines an element: <!ELEMENT ...>
120
121       ·   XML::DOM::AttlistDecl - Defines one or more attributes in an
122           <!ATTLIST ...>
123
124       ·   XML::DOM::AttDef - Defines one attribute in an <!ATTLIST ...>
125
126       ·   XML::DOM::XMLDecl - An XML declaration: <?xml version="1.0" ...>
127
128       Other classes that are part of the DOM Level 1 Spec:
129
130       ·   XML::DOM::Implementation - Provides information about this
131           implementation. Currently it doesn't do much.
132
133       ·   XML::DOM::NodeList - Used internally to store a node's child nodes.
134           Also returned by getElementsByTagName.
135
136       ·   XML::DOM::NamedNodeMap - Used internally to store an element's
137           attributes.
138
139       Other classes that are not part of the DOM Level 1 Spec:
140
141       ·   XML::DOM::Parser - An non-validating XML parser that creates
142           XML::DOM::Documents
143
144       ·   XML::DOM::ValParser - A validating XML parser that creates
145           XML::DOM::Documents. It uses XML::Checker to check against the
146           DocumentType (DTD)
147
148       ·   XML::Handler::BuildDOM - A PerlSAX handler that creates
149           XML::DOM::Documents.
150

XML::DOM package

152       Constant definitions
153           The following predefined constants indicate which type of node it
154           is.
155
156        UNKNOWN_NODE (0)                The node type is unknown (not part of DOM)
157
158        ELEMENT_NODE (1)                The node is an Element.
159        ATTRIBUTE_NODE (2)              The node is an Attr.
160        TEXT_NODE (3)                   The node is a Text node.
161        CDATA_SECTION_NODE (4)          The node is a CDATASection.
162        ENTITY_REFERENCE_NODE (5)       The node is an EntityReference.
163        ENTITY_NODE (6)                 The node is an Entity.
164        PROCESSING_INSTRUCTION_NODE (7) The node is a ProcessingInstruction.
165        COMMENT_NODE (8)                The node is a Comment.
166        DOCUMENT_NODE (9)               The node is a Document.
167        DOCUMENT_TYPE_NODE (10)         The node is a DocumentType.
168        DOCUMENT_FRAGMENT_NODE (11)     The node is a DocumentFragment.
169        NOTATION_NODE (12)              The node is a Notation.
170
171        ELEMENT_DECL_NODE (13)          The node is an ElementDecl (not part of DOM)
172        ATT_DEF_NODE (14)               The node is an AttDef (not part of DOM)
173        XML_DECL_NODE (15)              The node is an XMLDecl (not part of DOM)
174        ATTLIST_DECL_NODE (16)          The node is an AttlistDecl (not part of DOM)
175
176        Usage:
177
178          if ($node->getNodeType == ELEMENT_NODE)
179          {
180              print "It's an Element";
181          }
182
183       Not In DOM Spec: The DOM Spec does not mention UNKNOWN_NODE and, quite
184       frankly, you should never encounter it. The last 4 node types were
185       added to support the 4 added node classes.
186
187   Global Variables
188       $VERSION
189           The variable $XML::DOM::VERSION contains the version number of this
190           implementation, e.g. "1.43".
191
192   METHODS
193       These methods are not part of the DOM Level 1 Specification.
194
195       getIgnoreReadOnly and ignoreReadOnly (readOnly)
196           The DOM Level 1 Spec does not allow you to edit certain sections of
197           the document, e.g. the DocumentType, so by default this
198           implementation throws DOMExceptions (i.e.
199           NO_MODIFICATION_ALLOWED_ERR) when you try to edit a readonly node.
200           These readonly checks can be disabled by (temporarily) setting the
201           global IgnoreReadOnly flag.
202
203           The ignoreReadOnly method sets the global IgnoreReadOnly flag and
204           returns its previous value. The getIgnoreReadOnly method simply
205           returns its current value.
206
207            my $oldIgnore = XML::DOM::ignoreReadOnly (1);
208            eval {
209            ... do whatever you want, catching any other exceptions ...
210            };
211            XML::DOM::ignoreReadOnly ($oldIgnore);     # restore previous value
212
213           Another way to do it, using a local variable:
214
215            { # start new scope
216               local $XML::DOM::IgnoreReadOnly = 1;
217               ... do whatever you want, don't worry about exceptions ...
218            } # end of scope ($IgnoreReadOnly is set back to its previous value)
219
220       isValidName (name)
221           Whether the specified name is a valid "Name" as specified in the
222           XML spec.  Characters with Unicode values > 127 are now also
223           supported.
224
225       getAllowReservedNames and allowReservedNames (boolean)
226           The first method returns whether reserved names are allowed.  The
227           second takes a boolean argument and sets whether reserved names are
228           allowed.  The initial value is 1 (i.e. allow reserved names.)
229
230           The XML spec states that "Names" starting with (X|x)(M|m)(L|l) are
231           reserved for future use. (Amusingly enough, the XML version of the
232           XML spec (REC-xml-19980210.xml) breaks that very rule by defining
233           an ENTITY with the name 'xmlpio'.)  A "Name" in this context means
234           the Name token as found in the BNF rules in the XML spec.
235
236           XML::DOM only checks for errors when you modify the DOM tree, not
237           when the DOM tree is built by the XML::DOM::Parser.
238
239       setTagCompression (funcref)
240           There are 3 possible styles for printing empty Element tags:
241
242           Style 0
243                <empty/> or <empty attr="val"/>
244
245               XML::DOM uses this style by default for all Elements.
246
247           Style 1
248                 <empty></empty> or <empty attr="val"></empty>
249
250           Style 2
251                 <empty /> or <empty attr="val" />
252
253               This style is sometimes desired when using XHTML.  (Note the
254               extra space before the slash "/") See
255               <http://www.w3.org/TR/xhtml1> Appendix C for more details.
256
257           By default XML::DOM compresses all empty Element tags (style 0.)
258           You can control which style is used for a particular Element by
259           calling XML::DOM::setTagCompression with a reference to a function
260           that takes 2 arguments. The first is the tag name of the Element,
261           the second is the XML::DOM::Element that is being printed.  The
262           function should return 0, 1 or 2 to indicate which style should be
263           used to print the empty tag. E.g.
264
265            XML::DOM::setTagCompression (\&my_tag_compression);
266
267            sub my_tag_compression
268            {
269               my ($tag, $elem) = @_;
270
271               # Print empty br, hr and img tags like this: <br />
272               return 2 if $tag =~ /^(br|hr|img)$/;
273
274               # Print other empty tags like this: <empty></empty>
275               return 1;
276            }
277

IMPLEMENTATION DETAILS

279       ·   Perl Mappings
280
281           The value undef was used when the DOM Spec said null.
282
283           The DOM Spec says: Applications must encode DOMString using UTF-16
284           (defined in Appendix C.3 of [UNICODE] and Amendment 1 of
285           [ISO-10646]).  In this implementation we use plain old Perl strings
286           encoded in UTF-8 instead of UTF-16.
287
288       ·   Text and CDATASection nodes
289
290           The Expat parser expands EntityReferences and CDataSection sections
291           to raw strings and does not indicate where it was found.  This
292           implementation does therefore convert both to Text nodes at parse
293           time.  CDATASection and EntityReference nodes that are added to an
294           existing Document (by the user) will be preserved.
295
296           Also, subsequent Text nodes are always merged at parse time. Text
297           nodes that are added later can be merged with the normalize method.
298           Consider using the addText method when adding Text nodes.
299
300       ·   Printing and toString
301
302           When printing (and converting an XML Document to a string) the
303           strings have to encoded differently depending on where they occur.
304           E.g. in a CDATASection all substrings are allowed except for "]]>".
305           In regular text, certain characters are not allowed, e.g. ">" has
306           to be converted to "&gt;".  These routines should be verified by
307           someone who knows the details.
308
309       ·   Quotes
310
311           Certain sections in XML are quoted, like attribute values in an
312           Element.  XML::Parser strips these quotes and the print methods in
313           this implementation always uses double quotes, so when parsing and
314           printing a document, single quotes may be converted to double
315           quotes. The default value of an attribute definition (AttDef) in an
316           AttlistDecl, however, will maintain its quotes.
317
318       ·   AttlistDecl
319
320           Attribute declarations for a certain Element are always merged into
321           a single AttlistDecl object.
322
323       ·   Comments
324
325           Comments in the DOCTYPE section are not kept in the right place.
326           They will become child nodes of the Document.
327
328       ·   Hidden Nodes
329
330           Previous versions of XML::DOM would expand parameter entity
331           references (like %pent;), so when printing the DTD, it would print
332           the contents of the external entity, instead of the parameter
333           entity reference.  With this release (1.27), you can prevent this
334           by setting the XML::DOM::Parser options ParseParamEnt => 1 and
335           ExpandParamEnt => 0.
336
337           When it is parsing the contents of the external entities, it *DOES*
338           still add the nodes to the DocumentType, but it marks these nodes
339           by setting the 'Hidden' property. In addition, it adds an
340           EntityReference node to the DocumentType node.
341
342           When printing the DocumentType node (or when using to_expat() or
343           to_sax()), the 'Hidden' nodes are suppressed, so you will see the
344           parameter entity reference instead of the contents of the external
345           entities. See test case t/dom_extent.t for an example.
346
347           The reason for adding the 'Hidden' nodes to the DocumentType node,
348           is that the nodes may contain <!ENTITY> definitions that are
349           referenced further in the document. (Simply not adding the nodes to
350           the DocumentType could cause such entity references to be expanded
351           incorrectly.)
352
353           Note that you need XML::Parser 2.27 or higher for this to work
354           correctly.
355

SEE ALSO

357       XML::DOM::XPath
358
359       The Japanese version of this document by Takanori Kawai (Hippo2000) at
360       <http://member.nifty.ne.jp/hippo2000/perltips/xml/dom.htm>
361
362       The DOM Level 1 specification at <http://www.w3.org/TR/REC-DOM-Level-1>
363
364       The XML spec (Extensible Markup Language 1.0) at
365       <http://www.w3.org/TR/REC-xml>
366
367       The XML::Parser and XML::Parser::Expat manual pages.
368
369       XML::LibXML also provides a DOM Parser, and is significantly faster
370       than XML::DOM, and is under active development.  It requires that you
371       download the Gnome libxml library.
372
373       XML::GDOME will provide the DOM Level 2 Core API, and should be as fast
374       as XML::LibXML, but more robust, since it uses the memory management
375       functions of libgdome.  For more details see
376       <http://tjmather.com/xml-gdome/>
377

CAVEATS

379       The method getElementsByTagName() does not return a "live" NodeList.
380       Whether this is an actual caveat is debatable, but a few people on the
381       www-dom mailing list seemed to think so. I haven't decided yet. It's a
382       pain to implement, it slows things down and the benefits seem marginal.
383       Let me know what you think.
384

AUTHOR

386       Enno Derksen is the original author.
387
388       Send patches to T.J. Mather at <tjmather@maxmind.com>.
389
390       Paid support is available from directly from the maintainers of this
391       package.  Please see <http://www.maxmind.com/app/opensourceservices>
392       for more details.
393
394       Thanks to Clark Cooper for his help with the initial version.
395
396
397
398perl v5.16.3                      2005-07-26                       XML::DOM(3)
Impressum