1XML::DOM(3) User Contributed Perl Documentation XML::DOM(3)
2
3
4
6 XML::DOM - A perl module for building DOM Level 1 compliant document
7 structures
8
10 use XML::DOM;
11
12 my $parser = new XML::DOM::Parser;
13 my $doc = $parser->parsefile ("file.xml");
14
15 # print all HREF attributes of all CODEBASE elements
16 my $nodes = $doc->getElementsByTagName ("CODEBASE");
17 my $n = $nodes->getLength;
18
19 for (my $i = 0; $i < $n; $i++)
20 {
21 my $node = $nodes->item ($i);
22 my $href = $node->getAttributeNode ("HREF");
23 print $href->getValue . "\n";
24 }
25
26 # Print doc file
27 $doc->printToFile ("out.xml");
28
29 # Print to string
30 print $doc->toString;
31
32 # Avoid memory leaks - cleanup circular references for garbage collection
33 $doc->dispose;
34
36 This module extends the XML::Parser module by Clark Cooper. The
37 XML::Parser module is built on top of XML::Parser::Expat, which is a
38 lower level interface to James Clark's expat library.
39
40 XML::DOM::Parser is derived from XML::Parser. It parses XML strings or
41 files and builds a data structure that conforms to the API of the Docu‐
42 ment Object Model as described at http://www.w3.org/TR/REC-DOM-Level-1.
43 See the XML::Parser manpage for other available features of the
44 XML::DOM::Parser class. Note that the 'Style' property should not be
45 used (it is set internally.)
46
47 The XML::Parser NoExpand option is more or less supported, in that it
48 will generate EntityReference objects whenever an entity reference is
49 encountered in character data. I'm not sure how useful this is. Any
50 comments are welcome.
51
52 As described in the synopsis, when you create an XML::DOM::Parser
53 object, the parse and parsefile methods create an XML::DOM::Document
54 object from the specified input. This Document object can then be exam‐
55 ined, modified and written back out to a file or converted to a string.
56
57 When using XML::DOM with XML::Parser version 2.19 and up, setting the
58 XML::DOM::Parser option KeepCDATA to 1 will store CDATASections in
59 CDATASection nodes, instead of converting them to Text nodes. Subse‐
60 quent CDATASection nodes will be merged into one. Let me know if this
61 is a problem.
62
63 When using XML::Parser 2.27 and above, you can suppress expansion of
64 parameter entity references (e.g. %pent;) in the DTD, by setting
65 ParseParamEnt to 1 and ExpandParamEnt to 0. See Hidden Nodes for
66 details.
67
68 A Document has a tree structure consisting of Node objects. A Node may
69 contain other nodes, depending on its type. A Document may have Ele‐
70 ment, Text, Comment, and CDATASection nodes. Element nodes may have
71 Attr, Element, Text, Comment, and CDATASection nodes. The other nodes
72 may not have any child nodes.
73
74 This module adds several node types that are not part of the DOM spec
75 (yet.) These are: ElementDecl (for <!ELEMENT ...> declarations),
76 AttlistDecl (for <!ATTLIST ...> declarations), XMLDecl (for <?xml ...?>
77 declarations) and AttDef (for attribute definitions in an AttlistDecl.)
78
80 The XML::DOM module stores XML documents in a tree structure with a
81 root node of type XML::DOM::Document. Different nodes in tree represent
82 different parts of the XML file. The DOM Level 1 Specification defines
83 the following node types:
84
85 * XML::DOM::Node - Super class of all node types
86 * XML::DOM::Document - The root of the XML document
87 * XML::DOM::DocumentType - Describes the document structure: <!DOCTYPE
88 root [ ... ]>
89 * XML::DOM::Element - An XML element: <elem attr="val"> ... </elem>
90 * XML::DOM::Attr - An XML element attribute: name="value"
91 * XML::DOM::CharacterData - Super class of Text, Comment and CDATASec‐
92 tion
93 * XML::DOM::Text - Text in an XML element
94 * XML::DOM::CDATASection - Escaped block of text: <![CDATA[ text ]]>
95 * XML::DOM::Comment - An XML comment: <!-- comment -->
96 * XML::DOM::EntityReference - Refers to an ENTITY: &ent; or %ent;
97 * XML::DOM::Entity - An ENTITY definition: <!ENTITY ...>
98 * XML::DOM::ProcessingInstruction - <?PI target>
99 * XML::DOM::DocumentFragment - Lightweight node for cut & paste
100 * XML::DOM::Notation - An NOTATION definition: <!NOTATION ...>
101
102 In addition, the XML::DOM module contains the following nodes that are
103 not part of the DOM Level 1 Specification:
104
105 * XML::DOM::ElementDecl - Defines an element: <!ELEMENT ...>
106 * XML::DOM::AttlistDecl - Defines one or more attributes in an
107 <!ATTLIST ...>
108 * XML::DOM::AttDef - Defines one attribute in an <!ATTLIST ...>
109 * XML::DOM::XMLDecl - An XML declaration: <?xml version="1.0" ...>
110
111 Other classes that are part of the DOM Level 1 Spec:
112
113 * XML::DOM::Implementation - Provides information about this implemen‐
114 tation. Currently it doesn't do much.
115 * XML::DOM::NodeList - Used internally to store a node's child nodes.
116 Also returned by getElementsByTagName.
117 * XML::DOM::NamedNodeMap - Used internally to store an element's
118 attributes.
119
120 Other classes that are not part of the DOM Level 1 Spec:
121
122 * XML::DOM::Parser - An non-validating XML parser that creates
123 XML::DOM::Documents
124 * XML::DOM::ValParser - A validating XML parser that creates
125 XML::DOM::Documents. It uses XML::Checker to check against the Docu‐
126 mentType (DTD)
127 * XML::Handler::BuildDOM - A PerlSAX handler that creates
128 XML::DOM::Documents.
129
131 Constant definitions
132 The following predefined constants indicate which type of node it
133 is.
134
135 UNKNOWN_NODE (0) The node type is unknown (not part of DOM)
136
137 ELEMENT_NODE (1) The node is an Element.
138 ATTRIBUTE_NODE (2) The node is an Attr.
139 TEXT_NODE (3) The node is a Text node.
140 CDATA_SECTION_NODE (4) The node is a CDATASection.
141 ENTITY_REFERENCE_NODE (5) The node is an EntityReference.
142 ENTITY_NODE (6) The node is an Entity.
143 PROCESSING_INSTRUCTION_NODE (7) The node is a ProcessingInstruction.
144 COMMENT_NODE (8) The node is a Comment.
145 DOCUMENT_NODE (9) The node is a Document.
146 DOCUMENT_TYPE_NODE (10) The node is a DocumentType.
147 DOCUMENT_FRAGMENT_NODE (11) The node is a DocumentFragment.
148 NOTATION_NODE (12) The node is a Notation.
149
150 ELEMENT_DECL_NODE (13) The node is an ElementDecl (not part of DOM)
151 ATT_DEF_NODE (14) The node is an AttDef (not part of DOM)
152 XML_DECL_NODE (15) The node is an XMLDecl (not part of DOM)
153 ATTLIST_DECL_NODE (16) The node is an AttlistDecl (not part of DOM)
154
155 Usage:
156
157 if ($node->getNodeType == ELEMENT_NODE)
158 {
159 print "It's an Element";
160 }
161
162 Not In DOM Spec: The DOM Spec does not mention UNKNOWN_NODE and, quite
163 frankly, you should never encounter it. The last 4 node types were
164 added to support the 4 added node classes.
165
166 Global Variables
167
168 $VERSION
169 The variable $XML::DOM::VERSION contains the version number of this
170 implementation, e.g. "1.43".
171
172 METHODS
173
174 These methods are not part of the DOM Level 1 Specification.
175
176 getIgnoreReadOnly and ignoreReadOnly (readOnly)
177 The DOM Level 1 Spec does not allow you to edit certain sections of
178 the document, e.g. the DocumentType, so by default this implementa‐
179 tion throws DOMExceptions (i.e. NO_MODIFICATION_ALLOWED_ERR) when
180 you try to edit a readonly node. These readonly checks can be dis‐
181 abled by (temporarily) setting the global IgnoreReadOnly flag.
182
183 The ignoreReadOnly method sets the global IgnoreReadOnly flag and
184 returns its previous value. The getIgnoreReadOnly method simply
185 returns its current value.
186
187 my $oldIgnore = XML::DOM::ignoreReadOnly (1);
188 eval {
189 ... do whatever you want, catching any other exceptions ...
190 };
191 XML::DOM::ignoreReadOnly ($oldIgnore); # restore previous value
192
193 Another way to do it, using a local variable:
194
195 { # start new scope
196 local $XML::DOM::IgnoreReadOnly = 1;
197 ... do whatever you want, don't worry about exceptions ...
198 } # end of scope ($IgnoreReadOnly is set back to its previous value)
199
200 isValidName (name)
201 Whether the specified name is a valid "Name" as specified in the
202 XML spec. Characters with Unicode values > 127 are now also sup‐
203 ported.
204
205 getAllowReservedNames and allowReservedNames (boolean)
206 The first method returns whether reserved names are allowed. The
207 second takes a boolean argument and sets whether reserved names are
208 allowed. The initial value is 1 (i.e. allow reserved names.)
209
210 The XML spec states that "Names" starting with (X⎪x)(M⎪m)(L⎪l) are
211 reserved for future use. (Amusingly enough, the XML version of the
212 XML spec (REC-xml-19980210.xml) breaks that very rule by defining
213 an ENTITY with the name 'xmlpio'.) A "Name" in this context means
214 the Name token as found in the BNF rules in the XML spec.
215
216 XML::DOM only checks for errors when you modify the DOM tree, not
217 when the DOM tree is built by the XML::DOM::Parser.
218
219 setTagCompression (funcref)
220 There are 3 possible styles for printing empty Element tags:
221
222 Style 0
223 <empty/> or <empty attr="val"/>
224
225 XML::DOM uses this style by default for all Elements.
226
227 Style 1
228 <empty></empty> or <empty attr="val"></empty>
229
230 Style 2
231 <empty /> or <empty attr="val" />
232
233 This style is sometimes desired when using XHTML. (Note the
234 extra space before the slash "/") See
235 <http://www.w3.org/TR/xhtml1> Appendix C for more details.
236
237 By default XML::DOM compresses all empty Element tags (style 0.)
238 You can control which style is used for a particular Element by
239 calling XML::DOM::setTagCompression with a reference to a function
240 that takes 2 arguments. The first is the tag name of the Element,
241 the second is the XML::DOM::Element that is being printed. The
242 function should return 0, 1 or 2 to indicate which style should be
243 used to print the empty tag. E.g.
244
245 XML::DOM::setTagCompression (\&my_tag_compression);
246
247 sub my_tag_compression
248 {
249 my ($tag, $elem) = @_;
250
251 # Print empty br, hr and img tags like this: <br />
252 return 2 if $tag =~ /^(br⎪hr⎪img)$/;
253
254 # Print other empty tags like this: <empty></empty>
255 return 1;
256 }
257
259 * Perl Mappings
260 The value undef was used when the DOM Spec said null.
261
262 The DOM Spec says: Applications must encode DOMString using UTF-16
263 (defined in Appendix C.3 of [UNICODE] and Amendment 1 of
264 [ISO-10646]). In this implementation we use plain old Perl strings
265 encoded in UTF-8 instead of UTF-16.
266
267 * Text and CDATASection nodes
268 The Expat parser expands EntityReferences and CDataSection sections
269 to raw strings and does not indicate where it was found. This
270 implementation does therefore convert both to Text nodes at parse
271 time. CDATASection and EntityReference nodes that are added to an
272 existing Document (by the user) will be preserved.
273
274 Also, subsequent Text nodes are always merged at parse time. Text
275 nodes that are added later can be merged with the normalize method.
276 Consider using the addText method when adding Text nodes.
277
278 * Printing and toString
279 When printing (and converting an XML Document to a string) the
280 strings have to encoded differently depending on where they occur.
281 E.g. in a CDATASection all substrings are allowed except for "]]>".
282 In regular text, certain characters are not allowed, e.g. ">" has
283 to be converted to ">". These routines should be verified by
284 someone who knows the details.
285
286 * Quotes
287 Certain sections in XML are quoted, like attribute values in an
288 Element. XML::Parser strips these quotes and the print methods in
289 this implementation always uses double quotes, so when parsing and
290 printing a document, single quotes may be converted to double
291 quotes. The default value of an attribute definition (AttDef) in an
292 AttlistDecl, however, will maintain its quotes.
293
294 * AttlistDecl
295 Attribute declarations for a certain Element are always merged into
296 a single AttlistDecl object.
297
298 * Comments
299 Comments in the DOCTYPE section are not kept in the right place.
300 They will become child nodes of the Document.
301
302 * Hidden Nodes
303 Previous versions of XML::DOM would expand parameter entity refer‐
304 ences (like %pent;), so when printing the DTD, it would print the
305 contents of the external entity, instead of the parameter entity
306 reference. With this release (1.27), you can prevent this by set‐
307 ting the XML::DOM::Parser options ParseParamEnt => 1 and Expand‐
308 ParamEnt => 0.
309
310 When it is parsing the contents of the external entities, it *DOES*
311 still add the nodes to the DocumentType, but it marks these nodes
312 by setting the 'Hidden' property. In addition, it adds an Enti‐
313 tyReference node to the DocumentType node.
314
315 When printing the DocumentType node (or when using to_expat() or
316 to_sax()), the 'Hidden' nodes are suppressed, so you will see the
317 parameter entity reference instead of the contents of the external
318 entities. See test case t/dom_extent.t for an example.
319
320 The reason for adding the 'Hidden' nodes to the DocumentType node,
321 is that the nodes may contain <!ENTITY> definitions that are refer‐
322 enced further in the document. (Simply not adding the nodes to the
323 DocumentType could cause such entity references to be expanded
324 incorrectly.)
325
326 Note that you need XML::Parser 2.27 or higher for this to work cor‐
327 rectly.
328
330 XML::DOM::XPath
331
332 The Japanese version of this document by Takanori Kawai (Hippo2000) at
333 <http://member.nifty.ne.jp/hippo2000/perltips/xml/dom.htm>
334
335 The DOM Level 1 specification at <http://www.w3.org/TR/REC-DOM-Level-1>
336
337 The XML spec (Extensible Markup Language 1.0) at
338 <http://www.w3.org/TR/REC-xml>
339
340 The XML::Parser and XML::Parser::Expat manual pages.
341
342 XML::LibXML also provides a DOM Parser, and is significantly faster
343 than XML::DOM, and is under active development. It requires that you
344 download the Gnome libxml library.
345
346 XML::GDOME will provide the DOM Level 2 Core API, and should be as fast
347 as XML::LibXML, but more robust, since it uses the memory management
348 functions of libgdome. For more details see
349 <http://tjmather.com/xml-gdome/>
350
352 The method getElementsByTagName() does not return a "live" NodeList.
353 Whether this is an actual caveat is debatable, but a few people on the
354 www-dom mailing list seemed to think so. I haven't decided yet. It's a
355 pain to implement, it slows things down and the benefits seem marginal.
356 Let me know what you think.
357
359 Enno Derksen is the original author.
360
361 Send patches to T.J. Mather at <tjmather@maxmind.com>.
362
363 Paid support is available from directly from the maintainers of this
364 package. Please see <http://www.maxmind.com/app/opensourceservices>
365 for more details.
366
367 Thanks to Clark Cooper for his help with the initial version.
368
369
370
371perl v5.8.8 2002-02-08 XML::DOM(3)