1XML::LibXML::Reader(3)User Contributed Perl DocumentationXML::LibXML::Reader(3)
2
3
4
6 XML::LibXML::Reader - XML::LibXML::Reader - interface to libxml2 pull
7 parser
8
10 use XML::LibXML::Reader;
11
12 $reader = new XML::LibXML::Reader("file.xml")
13 or die "cannot read file.xml\n";
14 while ($reader->read) {
15 processNode($reader);
16 }
17
18 sub processNode {
19 $reader = shift;
20 printf "%d %d %s %d\n", ($reader->depth,
21 $reader->nodeType,
22 $reader->name,
23 $reader->isEmptyElement);
24 }
25
26 or
27
28 $reader = new XML::LibXML::Reader("file.xml")
29 or die "cannot read file.xml\n";
30 $reader->preservePattern('//table/tr');
31 $reader->finish;
32 print $reader->document->toString(1);
33
35 This is a perl interface to libxml2's pull-parser implementation xml‐
36 TextReader http://xmlsoft.org/html/libxml-xmlreader.html. Pull-parser
37 (StAX in Java, XmlReader in C#) use an iterator approach to parse a
38 xml-file. They are easier to program than event-based parser (SAX) and
39 much more lightweight than tree-based parser (DOM), which load the com‐
40 plete tree into memory.
41
42 The Reader acts as a cursor going forward on the document stream and
43 stopping at each node in the way. At every point DOM-like methods of
44 the Reader object allow to examine the current node (name, namespace,
45 attributes, etc.)
46
47 The user's code keeps control of the progress and simply calls the
48 read() function repeatedly to progress to the next node in the document
49 order. Other functions provide means for skipping complete subtrees, or
50 nodes until a specific element, etc.
51
52 At every time, only a very limitted portion of the document is kept in
53 the memory, which makes the API more memory-efficient than using DOM.
54 However, it is also possible to mix Reader with DOM. At every point the
55 user may copy the current node (optionally expanded into a complete
56 subtree) from the processed document to another DOM tree, or to
57 instruct the Reader to collect sub-document in form of a DOM tree con‐
58 sisting of selected nodes.
59
60 Reader API also supports namespaces, xml:base, entity handling, and DTD
61 validation. Schema and RelaxNG validation support will probably be
62 added in some later revision of the Perl interface.
63
64 The naming of methods compared to libxml2 and C# XmlTextReader has been
65 changed slightly to match the conventions of XML::LibXML. Some func‐
66 tions have been changed or added with respect to the C interface.
67
69 Depending on the XML source, the Reader object can be created with
70 either of:
71
72 my $reader = XML::LibXML::Reader->new( location => "file.xml", ... );
73 my $reader = XML::LibXML::Reader->new( string => $xml_string, ... );
74 my $reader = XML::LibXML::Reader->new( IO => $file_handle, ... );
75 my $reader = XML::LibXML::Reader->new( DOM => $dom, ... );
76
77 where ... are (optional) reader options described below in Parser
78 options. The constructor recognizes the following XML sources:
79
80 Source specification
81
82 location
83 Read XML from a local file or URL.
84
85 string
86 Read XML from a string.
87
88 IO Read XML a Perl IO filehandle.
89
90 FD Read XML from a file descriptor (bypasses Perl I/O layer, only
91 applicable to filehandles for regular files or pipes). Possibly
92 faster than IO.
93
94 DOM Use reader API to walk through a preparsed XML::LibXML::Document.
95
96 Parsing options
97
98 URI can be used to provide baseURI when parsing strings or filehandles.
99
100 encoding
101 override document encoding.
102
103 RelaxNG
104 can be used to pass either a XML::LibXML::RelaxNG object or a file‐
105 name or URL of a RelaxNG schema to the constructor. The schema is
106 then used to validate the document as it is processed.
107
108 Schema
109 can be used to pass either a XML::LibXML::Schema object or a file‐
110 name or URL of a W3C XSD schema to the constructor. The schema is
111 then used to validate the document as it is processed.
112
113 recover
114 recover on errors (0 or 1)
115
116 expand_entities
117 substitute entities (0 or 1)
118
119 load_ext_dtd
120 load the external subset (0 or 1)
121
122 complete_attributes
123 default DTD attributes (0 or 1)
124
125 validation
126 validate with the DTD (0 or 1)
127
128 suppress_errors
129 suppress error reports (0 or 1)
130
131 suppress_warnings
132 suppress warning reports (0 or 1)
133
134 pedantic_parser
135 pedantic error reporting (0 or 1)
136
137 no_blanks
138 remove blank nodes (0 or 1)
139
140 expand_xinclude
141 Implement XInclude substitition (0 or 1)
142
143 no_network
144 Forbid network access (0 or 1)
145
146 clean_namespaces
147 remove redundant namespaces declarations (0 or 1)
148
149 no_cdata
150 merge CDATA as text nodes (0 or 1)
151
152 no_xinclude_nodes
153 do not generate XINCLUDE START/END nodes (0 or 1)
154
156 read ()
157 Moves the position to the next node in the stream, exposing its
158 properties.
159
160 Returns 1 if the node was read successfully, 0 if there is no more
161 nodes to read, or -1 in case of error
162
163 readAttributeValue ()
164 Parses an attribute value into one or more Text and EntityReference
165 nodes.
166
167 Returns 1 in case of success, 0 if the reader was not positionned
168 on an attribute node or all the attribute values have been read, or
169 -1 in case of error.
170
171 readState ()
172 Gets the read state of the reader. Returns the state value, or -1
173 in case of error. The module exports constants for the Reader
174 states, see STATES below.
175
176 depth ()
177 The depth of the node in the tree, starts at 0 for the root node.
178
179 next ()
180 Skip to the node following the current one in the document order
181 while avoiding the subtree if any. Returns 1 if the node was read
182 successfully, 0 if there is no more nodes to read, or -1 in case of
183 error.
184
185 nextElement (localname?,nsURI?)
186 Skip nodes following the current one in the document order until a
187 specific element is reached. The element's name must be equal to a
188 given localname if defined, and its namespace must equal to a given
189 nsURI if defined. Either of the arguments can be undefined (or
190 omitted, in case of the latter or both).
191
192 Returns 1 if the element was found, 0 if there is no more nodes to
193 read, or -1 in case of error.
194
195 skipSiblings ()
196 Skip all nodes on the same or lower level until the first node on a
197 higher level is reached. In particular, if the current node occurs
198 in an element, the reader stops at the end tag of the parent ele‐
199 ment, otherwise it stops at a node immediately following the parent
200 node.
201
202 Returns 1 if successful, 0 if end of the document is reached, or -1
203 in case of error.
204
205 nextSibling ()
206 It skips to the node following the current one in the document
207 order while avoiding the subtree if any.
208
209 Returns 1 if the node was read successfully, 0 if there is no more
210 nodes to read, or -1 in case of error
211
212 nextSiblingElement (name?,nsURI?)
213 Like nextElement but only processes sibling elements of the current
214 node (moving forward using nextSibling () rather than read (),
215 internally).
216
217 Returns 1 if the element was found, 0 if there is no more sibling
218 nodes, or -1 in case of error.
219
220 finish ()
221 Skip all remaining nodes in the document, reaching end of the docu‐
222 ment.
223
224 Returns 1 if successful, 0 in case of error.
225
226 close ()
227 This method releases any resources allocated by the current
228 instance and closes any underlying input. It returns 0 on failure
229 and 1 on success. This method is automatically called by the
230 destructor when the reader is forgotten, therefore you do not have
231 to call it directly.
232
234 name ()
235 Returns the qualified name of the current node, equal to (Pre‐
236 fix:)LocalName.
237
238 nodeType ()
239 Returns the type of the current node. See NODE TYPES below.
240
241 localName ()
242 Returns the local name of the node.
243
244 prefix ()
245 Returns the prefix of the namespace associated with the node.
246
247 namespaceURI ()
248 Returns the URI defining the namespace associated with the node.
249
250 isEmptyElement ()
251 Check if the current node is empty, this is a bit bizarre in the
252 sense that <a/> will be considered empty while <a></a> will not.
253
254 hasValue ()
255 Returns true if the node can have a text value.
256
257 value ()
258 Provides the text value of the node if present or undef if not
259 available.
260
261 readInnerXml ()
262 Reads the contents of the current node, including child nodes and
263 markup. Returns a string containing the XML of the node's content,
264 or undef if the current node is neither an element nor attribute,
265 or has no child nodes.
266
267 readOuterXml ()
268 Reads the contents of the current node, including child nodes and
269 markup.
270
271 Returns a string containing the XML of the node including its con‐
272 tent, or undef if the current node is neither an element nor
273 attribute.
274
276 document ()
277 Provides access to the document tree built by the reader. This
278 function can be used to collect the preserved nodes (see preserveN‐
279 ode() and preservePattern).
280
281 CAUTION: Never use this function to modify the tree unless reading
282 of the whole document is completed!
283
284 copyCurrentNode (deep)
285 This function is similar a DOM function copyNode(). It returns a
286 copy of the currently processed node as a corresponding DOM object.
287 Use deep = 1 to obtain the full subtree.
288
289 preserveNode ()
290 This tells the XML Reader to preserve the current node in the docu‐
291 ment tree. A document tree consisting of the preserved nodes and
292 their content can be obtained using the method document() once
293 parsing is finished.
294
295 Returns the node or NULL in case of error.
296
297 preservePattern (pattern,\%ns_map)
298 This tells the XML Reader to preserve all nodes matched by the pat‐
299 tern (which is a streaming XPath subset). A document tree consist‐
300 ing of the preserved nodes and their content can be obtained using
301 the method document() once parsing is finished.
302
303 An optional second argument can be used to provide a HASH reference
304 mapping prefixes used by the XPath to namespace URIs.
305
306 The XPath subset available with this function is described at
307
308 http://www.w3.org/TR/xmlschema-1/#Selector
309
310 and matches the production
311
312 Path ::= ('.//')? ( Step '/' )* ( Step ⎪ '@' NameTest )
313
314 Returns a positive number in case of success and -1 in case of
315 error
316
318 attributeCount ()
319 Provides the number of attributes of the current node.
320
321 hasAttributes ()
322 Whether the node has attributes.
323
324 getAttribute (name)
325 Provides the value of the attribute with the specified qualified
326 name.
327
328 Returns a string containing the value of the specified attribute,
329 or undef in case of error.
330
331 getAttributeNs (localName, namespaceURI)
332 Provides the value of the specified attribute.
333
334 Returns a string containing the value of the specified attribute,
335 or undef in case of error.
336
337 getAttributeNo (no)
338 Provides the value of the attribute with the specified index rela‐
339 tive to the containing element.
340
341 Returns a string containing the value of the specified attribute,
342 or undef in case of error.
343
344 isDefault ()
345 Returns true if the current attribute node was generated from the
346 default value defined in the DTD.
347
348 moveToAttribute (name)
349 Moves the position to the attribute with the specified local name
350 and namespace URI.
351
352 Returns 1 in case of success, -1 in case of error, 0 if not found
353
354 moveToAttributeNo (no)
355 Moves the position to the attribute with the specified index rela‐
356 tive to the containing element.
357
358 Returns 1 in case of success, -1 in case of error, 0 if not found
359
360 moveToAttributeNs (localName,namespaceURI)
361 Moves the position to the attribute with the specified local name
362 and namespace URI.
363
364 Returns 1 in case of success, -1 in case of error, 0 if not found
365
366 moveToFirstAttribute ()
367 Moves the position to the first attribute associated with the cur‐
368 rent node.
369
370 Returns 1 in case of success, -1 in case of error, 0 if not found
371
372 moveToNextAttribute ()
373 Moves the position to the next attribute associated with the cur‐
374 rent node.
375
376 Returns 1 in case of success, -1 in case of error, 0 if not found
377
378 moveToElement ()
379 Moves the position to the node that contains the current attribute
380 node.
381
382 Returns 1 in case of success, -1 in case of error, 0 if not moved
383
384 isNamespaceDecl ()
385 Determine whether the current node is a namespace declaration
386 rather than a regular attribute.
387
388 Returns 1 if the current node is a namespace declaration, 0 if it
389 is a regular attribute or other type of node, or -1 in case of
390 error.
391
393 lookupNamespace (prefix)
394 Resolves a namespace prefix in the scope of the current element.
395
396 Returns a string containing the namespace URI to which the prefix
397 maps or undef in case of error.
398
399 encoding ()
400 Returns a string containing the encoding of the document or undef
401 in case of error.
402
403 standalone ()
404 Determine the standalone status of the document being read. Returns
405 1 if the document was declared to be standalone, 0 if it was
406 declared to be not standalone, or -1 if the document did not spec‐
407 ify its standalone status or in case of error.
408
409 xmlVersion ()
410 Determine the XML version of the document being read. Returns a
411 string containing the XML version of the document or undef in case
412 of error.
413
414 baseURI ()
415 The base URI of the node. See the XML Base W3C specification.
416
417 isValid ()
418 Retrieve the validity status from the parser.
419
420 Returns 1 if valid, 0 if no, and -1 in case of error.
421
422 xmlLang ()
423 The xml:lang scope within which the node resides.
424
425 lineNumber ()
426 Provide the line number of the current parsing point. Available if
427 libxml2 >= 2.6.17.
428
429 columnNumber ()
430 Provide the column number of the current parsing point. Available
431 if libxml2 >= 2.6.17.
432
433 byteConsumed ()
434 This function provides the current index of the parser relative to
435 the start of the current entity. This function is computed in bytes
436 from the beginning starting at zero and finishing at the size in
437 bytes of the file if parsing a file. The function is of constant
438 cost if the input is UTF-8 but can be costly if run on non-UTF-8
439 input. Available if libxml2 >= 2.6.18.
440
441 setParserProp (prop = value, ...)>
442 Change the parser processing behaviour by changing some of its
443 internal properties. The following properties are available with
444 this function: ``load_ext_dtd'', ``complete_attributes'', ``valida‐
445 tion'', ``expand_entities''.
446
447 Since some of the properties can only be changed before any read
448 has been done, it is best to set the parsing properties at the con‐
449 structor.
450
451 Returns 0 if the call was successful, or -1 in case of error
452
453 getParserProp (prop)
454 Get value of an parser internal property. The following property
455 names can be used: ``load_ext_dtd'', ``complete_attributes'',
456 ``validation'', ``expand_entities''.
457
458 Returns the value, usually 0 or 1, or -1 in case of error.
459
461 XML::LibXML takes care of the reader object destruction when the last
462 reference to the reader object goes out of scope. The document tree is
463 preserved, though, if either of $reader->document or $reader->preser‐
464 veNode was used and references to the document tree exist.
465
467 The reader interface provides the following constants for node types
468 (the constant symbols are exported by default or if tag :types is
469 used).
470
471 XML_READER_TYPE_NONE => 0
472 XML_READER_TYPE_ELEMENT => 1
473 XML_READER_TYPE_ATTRIBUTE => 2
474 XML_READER_TYPE_TEXT => 3
475 XML_READER_TYPE_CDATA => 4
476 XML_READER_TYPE_ENTITY_REFERENCE => 5
477 XML_READER_TYPE_ENTITY => 6
478 XML_READER_TYPE_PROCESSING_INSTRUCTION => 7
479 XML_READER_TYPE_COMMENT => 8
480 XML_READER_TYPE_DOCUMENT => 9
481 XML_READER_TYPE_DOCUMENT_TYPE => 10
482 XML_READER_TYPE_DOCUMENT_FRAGMENT => 11
483 XML_READER_TYPE_NOTATION => 12
484 XML_READER_TYPE_WHITESPACE => 13
485 XML_READER_TYPE_SIGNIFICANT_WHITESPACE => 14
486 XML_READER_TYPE_END_ELEMENT => 15
487 XML_READER_TYPE_END_ENTITY => 16
488 XML_READER_TYPE_XML_DECLARATION => 17
489
491 The following constants represent the values returned by readState().
492 They are exported by default, or if tag :states is used:
493
494 XML_READER_NONE => -1
495 XML_READER_START => 0
496 XML_READER_ELEMENT => 1
497 XML_READER_END => 2
498 XML_READER_EMPTY => 3
499 XML_READER_BACKTRACK => 4
500 XML_READER_DONE => 5
501 XML_READER_ERROR => 6
502
504 0.02
505
507 Heiko Klein, <H.Klein@gmx.net<gt> and Petr Pajas, <pajas@matfyz.cz<gt>
508
510 http://xmlsoft.org/html/libxml-xmlreader.html
511
512 http://dotgnu.org/pnetlib-doc/System/Xml/XmlTextReader.html
513
515 Matt Sergeant, Christian Glahn, Petr Pajas,
516
518 1.62
519
521 2001-2006, AxKit.com Ltd; 2002-2006 Christian Glahn; 2006 Petr Pajas,
522 All rights reserved.
523
524
525
526perl v5.8.8 2006-11-17 XML::LibXML::Reader(3)