1XML::LibXML::Reader(3)User Contributed Perl DocumentationXML::LibXML::Reader(3)
2
3
4
6 XML::LibXML::Reader - XML::LibXML::Reader - interface to libxml2 pull
7 parser
8
10 use XML::LibXML::Reader;
11
12
13
14 my $reader = new XML::LibXML::Reader(location => "file.xml")
15 or die "cannot read file.xml\n";
16 while ($reader->read) {
17 processNode($reader);
18 }
19
20
21
22 sub processNode {
23 $reader = shift;
24 printf "%d %d %s %d\n", ($reader->depth,
25 $reader->nodeType,
26 $reader->name,
27 $reader->isEmptyElement);
28 }
29
30 or
31
32 $reader = new XML::LibXML::Reader(location => "file.xml")
33 or die "cannot read file.xml\n";
34 $reader->preservePattern('//table/tr');
35 $reader->finish;
36 print $reader->document->toString(1);
37
39 This is a perl interface to libxml2's pull-parser implementation
40 xmlTextReader http://xmlsoft.org/html/libxml-xmlreader.html. This
41 feature requires at least libxml2-2.6.21. Pull-parser (StAX in Java,
42 XmlReader in C#) use an iterator approach to parse a xml-file. They are
43 easier to program than event-based parser (SAX) and much more
44 lightweight than tree-based parser (DOM), which load the complete tree
45 into memory.
46
47 The Reader acts as a cursor going forward on the document stream and
48 stopping at each node in the way. At every point DOM-like methods of
49 the Reader object allow to examine the current node (name, namespace,
50 attributes, etc.)
51
52 The user's code keeps control of the progress and simply calls the
53 "read()" function repeatedly to progress to the next node in the
54 document order. Other functions provide means for skipping complete
55 sub-trees, or nodes until a specific element, etc.
56
57 At every time, only a very limited portion of the document is kept in
58 the memory, which makes the API more memory-efficient than using DOM.
59 However, it is also possible to mix Reader with DOM. At every point the
60 user may copy the current node (optionally expanded into a complete
61 sub-tree) from the processed document to another DOM tree, or to
62 instruct the Reader to collect sub-document in form of a DOM tree
63 consisting of selected nodes.
64
65 Reader API also supports namespaces, xml:base, entity handling, and DTD
66 validation. Schema and RelaxNG validation support will probably be
67 added in some later revision of the Perl interface.
68
69 The naming of methods compared to libxml2 and C# XmlTextReader has been
70 changed slightly to match the conventions of XML::LibXML. Some
71 functions have been changed or added with respect to the C interface.
72
74 Depending on the XML source, the Reader object can be created with
75 either of:
76
77 my $reader = XML::LibXML::Reader->new( location => "file.xml", ... );
78 my $reader = XML::LibXML::Reader->new( string => $xml_string, ... );
79 my $reader = XML::LibXML::Reader->new( IO => $file_handle, ... );
80 my $reader = XML::LibXML::Reader->new( FD => fileno(STDIN), ... );
81 my $reader = XML::LibXML::Reader->new( DOM => $dom, ... );
82
83 where ... are (optional) reader options described below in "Reader
84 options" or various parser options described in XML::LibXML::Parser.
85 The constructor recognizes the following XML sources:
86
87 Source specification
88 location
89 Read XML from a local file or URL.
90
91 string
92 Read XML from a string.
93
94 IO Read XML a Perl IO filehandle.
95
96 FD Read XML from a file descriptor (bypasses Perl I/O layer, only
97 applicable to filehandles for regular files or pipes). Possibly
98 faster than IO.
99
100 DOM Use reader API to walk through a pre-parsed XML::LibXML::Document.
101
102 Reader options
103 encoding => $encoding
104 override document encoding.
105
106 RelaxNG => $rng_schema
107 can be used to pass either a XML::LibXML::RelaxNG object or a
108 filename or URL of a RelaxNG schema to the constructor. The schema
109 is then used to validate the document as it is processed.
110
111 Schema => $xsd_schema
112 can be used to pass either a XML::LibXML::Schema object or a
113 filename or URL of a W3C XSD schema to the constructor. The schema
114 is then used to validate the document as it is processed.
115
116 ... the reader further supports various parser options described in
117 XML::LibXML::Parser (specificly those labeled by /reader/).
118
120 read ()
121 Moves the position to the next node in the stream, exposing its
122 properties.
123
124 Returns 1 if the node was read successfully, 0 if there is no more
125 nodes to read, or -1 in case of error
126
127 readAttributeValue ()
128 Parses an attribute value into one or more Text and EntityReference
129 nodes.
130
131 Returns 1 in case of success, 0 if the reader was not positioned on
132 an attribute node or all the attribute values have been read, or -1
133 in case of error.
134
135 readState ()
136 Gets the read state of the reader. Returns the state value, or -1
137 in case of error. The module exports constants for the Reader
138 states, see STATES below.
139
140 depth ()
141 The depth of the node in the tree, starts at 0 for the root node.
142
143 next ()
144 Skip to the node following the current one in the document order
145 while avoiding the sub-tree if any. Returns 1 if the node was read
146 successfully, 0 if there is no more nodes to read, or -1 in case of
147 error.
148
149 nextElement (localname?,nsURI?)
150 Skip nodes following the current one in the document order until a
151 specific element is reached. The element's name must be equal to a
152 given localname if defined, and its namespace must equal to a given
153 nsURI if defined. Either of the arguments can be undefined (or
154 omitted, in case of the latter or both).
155
156 Returns 1 if the element was found, 0 if there is no more nodes to
157 read, or -1 in case of error.
158
159 nextPatternMatch (compiled_pattern)
160 Skip nodes following the current one in the document order until an
161 element matching a given compiled pattern is reached. See
162 XML::LibXML::Pattern for information on compiled patterns. See also
163 the "matchesPattern" method.
164
165 Returns 1 if the element was found, 0 if there is no more nodes to
166 read, or -1 in case of error.
167
168 skipSiblings ()
169 Skip all nodes on the same or lower level until the first node on a
170 higher level is reached. In particular, if the current node occurs
171 in an element, the reader stops at the end tag of the parent
172 element, otherwise it stops at a node immediately following the
173 parent node.
174
175 Returns 1 if successful, 0 if end of the document is reached, or -1
176 in case of error.
177
178 nextSibling ()
179 It skips to the node following the current one in the document
180 order while avoiding the sub-tree if any.
181
182 Returns 1 if the node was read successfully, 0 if there is no more
183 nodes to read, or -1 in case of error
184
185 nextSiblingElement (name?,nsURI?)
186 Like nextElement but only processes sibling elements of the current
187 node (moving forward using "nextSibling ()" rather than "read ()",
188 internally).
189
190 Returns 1 if the element was found, 0 if there is no more sibling
191 nodes, or -1 in case of error.
192
193 finish ()
194 Skip all remaining nodes in the document, reaching end of the
195 document.
196
197 Returns 1 if successful, 0 in case of error.
198
199 close ()
200 This method releases any resources allocated by the current
201 instance and closes any underlying input. It returns 0 on failure
202 and 1 on success. This method is automatically called by the
203 destructor when the reader is forgotten, therefore you do not have
204 to call it directly.
205
207 name ()
208 Returns the qualified name of the current node, equal to
209 (Prefix:)LocalName.
210
211 nodeType ()
212 Returns the type of the current node. See NODE TYPES below.
213
214 localName ()
215 Returns the local name of the node.
216
217 prefix ()
218 Returns the prefix of the namespace associated with the node.
219
220 namespaceURI ()
221 Returns the URI defining the namespace associated with the node.
222
223 isEmptyElement ()
224 Check if the current node is empty, this is a bit bizarre in the
225 sense that <a/> will be considered empty while <a></a> will not.
226
227 hasValue ()
228 Returns true if the node can have a text value.
229
230 value ()
231 Provides the text value of the node if present or undef if not
232 available.
233
234 readInnerXml ()
235 Reads the contents of the current node, including child nodes and
236 markup. Returns a string containing the XML of the node's content,
237 or undef if the current node is neither an element nor attribute,
238 or has no child nodes.
239
240 readOuterXml ()
241 Reads the contents of the current node, including child nodes and
242 markup.
243
244 Returns a string containing the XML of the node including its
245 content, or undef if the current node is neither an element nor
246 attribute.
247
248 nodePath()
249 Returns a cannonical location path to the current element from the
250 root node to the current node. Namespaced elements are matched by
251 '*', because there is no way to declare prefixes within XPath
252 patterns. Unlike "XML::LibXML::Node::nodePath()", this function
253 does not provide sibling counts (i.e. instead of e.g. '/a/b[1]' and
254 '/a/b[2]' you get '/a/b' for both matches).
255
256 matchesPattern(compiled_pattern)
257 Returns a true value if the current node matches a compiled
258 pattern. See XML::LibXML::Pattern for information on compiled
259 patterns. See also the "nextPatternMatch" method.
260
262 document ()
263 Provides access to the document tree built by the reader. This
264 function can be used to collect the preserved nodes (see
265 "preserveNode()" and preservePattern).
266
267 CAUTION: Never use this function to modify the tree unless reading
268 of the whole document is completed!
269
270 copyCurrentNode (deep)
271 This function is similar a DOM function "copyNode()". It returns a
272 copy of the currently processed node as a corresponding DOM object.
273 Use deep = 1 to obtain the full sub-tree.
274
275 preserveNode ()
276 This tells the XML Reader to preserve the current node in the
277 document tree. A document tree consisting of the preserved nodes
278 and their content can be obtained using the method "document()"
279 once parsing is finished.
280
281 Returns the node or NULL in case of error.
282
283 preservePattern (pattern,\%ns_map)
284 This tells the XML Reader to preserve all nodes matched by the
285 pattern (which is a streaming XPath subset). A document tree
286 consisting of the preserved nodes and their content can be obtained
287 using the method "document()" once parsing is finished.
288
289 An optional second argument can be used to provide a HASH reference
290 mapping prefixes used by the XPath to namespace URIs.
291
292 The XPath subset available with this function is described at
293
294 http://www.w3.org/TR/xmlschema-1/#Selector
295
296 and matches the production
297
298 Path ::= ('.//')? ( Step '/' )* ( Step | '@' NameTest )
299
300 Returns a positive number in case of success and -1 in case of
301 error
302
304 attributeCount ()
305 Provides the number of attributes of the current node.
306
307 hasAttributes ()
308 Whether the node has attributes.
309
310 getAttribute (name)
311 Provides the value of the attribute with the specified qualified
312 name.
313
314 Returns a string containing the value of the specified attribute,
315 or undef in case of error.
316
317 getAttributeNs (localName, namespaceURI)
318 Provides the value of the specified attribute.
319
320 Returns a string containing the value of the specified attribute,
321 or undef in case of error.
322
323 getAttributeNo (no)
324 Provides the value of the attribute with the specified index
325 relative to the containing element.
326
327 Returns a string containing the value of the specified attribute,
328 or undef in case of error.
329
330 isDefault ()
331 Returns true if the current attribute node was generated from the
332 default value defined in the DTD.
333
334 moveToAttribute (name)
335 Moves the position to the attribute with the specified local name
336 and namespace URI.
337
338 Returns 1 in case of success, -1 in case of error, 0 if not found
339
340 moveToAttributeNo (no)
341 Moves the position to the attribute with the specified index
342 relative to the containing element.
343
344 Returns 1 in case of success, -1 in case of error, 0 if not found
345
346 moveToAttributeNs (localName,namespaceURI)
347 Moves the position to the attribute with the specified local name
348 and namespace URI.
349
350 Returns 1 in case of success, -1 in case of error, 0 if not found
351
352 moveToFirstAttribute ()
353 Moves the position to the first attribute associated with the
354 current node.
355
356 Returns 1 in case of success, -1 in case of error, 0 if not found
357
358 moveToNextAttribute ()
359 Moves the position to the next attribute associated with the
360 current node.
361
362 Returns 1 in case of success, -1 in case of error, 0 if not found
363
364 moveToElement ()
365 Moves the position to the node that contains the current attribute
366 node.
367
368 Returns 1 in case of success, -1 in case of error, 0 if not moved
369
370 isNamespaceDecl ()
371 Determine whether the current node is a namespace declaration
372 rather than a regular attribute.
373
374 Returns 1 if the current node is a namespace declaration, 0 if it
375 is a regular attribute or other type of node, or -1 in case of
376 error.
377
379 lookupNamespace (prefix)
380 Resolves a namespace prefix in the scope of the current element.
381
382 Returns a string containing the namespace URI to which the prefix
383 maps or undef in case of error.
384
385 encoding ()
386 Returns a string containing the encoding of the document or undef
387 in case of error.
388
389 standalone ()
390 Determine the standalone status of the document being read. Returns
391 1 if the document was declared to be standalone, 0 if it was
392 declared to be not standalone, or -1 if the document did not
393 specify its standalone status or in case of error.
394
395 xmlVersion ()
396 Determine the XML version of the document being read. Returns a
397 string containing the XML version of the document or undef in case
398 of error.
399
400 baseURI ()
401 Returns the base URI of a given node.
402
403 isValid ()
404 Retrieve the validity status from the parser.
405
406 Returns 1 if valid, 0 if no, and -1 in case of error.
407
408 xmlLang ()
409 The xml:lang scope within which the node resides.
410
411 lineNumber ()
412 Provide the line number of the current parsing point.
413
414 columnNumber ()
415 Provide the column number of the current parsing point.
416
417 byteConsumed ()
418 This function provides the current index of the parser relative to
419 the start of the current entity. This function is computed in bytes
420 from the beginning starting at zero and finishing at the size in
421 bytes of the file if parsing a file. The function is of constant
422 cost if the input is UTF-8 but can be costly if run on non-UTF-8
423 input.
424
425 setParserProp (prop => value, ...)
426 Change the parser processing behaviour by changing some of its
427 internal properties. The following properties are available with
428 this function: ``load_ext_dtd'', ``complete_attributes'',
429 ``validation'', ``expand_entities''.
430
431 Since some of the properties can only be changed before any read
432 has been done, it is best to set the parsing properties at the
433 constructor.
434
435 Returns 0 if the call was successful, or -1 in case of error
436
437 getParserProp (prop)
438 Get value of an parser internal property. The following property
439 names can be used: ``load_ext_dtd'', ``complete_attributes'',
440 ``validation'', ``expand_entities''.
441
442 Returns the value, usually 0 or 1, or -1 in case of error.
443
445 XML::LibXML takes care of the reader object destruction when the last
446 reference to the reader object goes out of scope. The document tree is
447 preserved, though, if either of $reader->document or
448 $reader->preserveNode was used and references to the document tree
449 exist.
450
452 The reader interface provides the following constants for node types
453 (the constant symbols are exported by default or if tag ":types" is
454 used).
455
456 XML_READER_TYPE_NONE => 0
457 XML_READER_TYPE_ELEMENT => 1
458 XML_READER_TYPE_ATTRIBUTE => 2
459 XML_READER_TYPE_TEXT => 3
460 XML_READER_TYPE_CDATA => 4
461 XML_READER_TYPE_ENTITY_REFERENCE => 5
462 XML_READER_TYPE_ENTITY => 6
463 XML_READER_TYPE_PROCESSING_INSTRUCTION => 7
464 XML_READER_TYPE_COMMENT => 8
465 XML_READER_TYPE_DOCUMENT => 9
466 XML_READER_TYPE_DOCUMENT_TYPE => 10
467 XML_READER_TYPE_DOCUMENT_FRAGMENT => 11
468 XML_READER_TYPE_NOTATION => 12
469 XML_READER_TYPE_WHITESPACE => 13
470 XML_READER_TYPE_SIGNIFICANT_WHITESPACE => 14
471 XML_READER_TYPE_END_ELEMENT => 15
472 XML_READER_TYPE_END_ENTITY => 16
473 XML_READER_TYPE_XML_DECLARATION => 17
474
476 The following constants represent the values returned by "readState()".
477 They are exported by default, or if tag ":states" is used:
478
479 XML_READER_NONE => -1
480 XML_READER_START => 0
481 XML_READER_ELEMENT => 1
482 XML_READER_END => 2
483 XML_READER_EMPTY => 3
484 XML_READER_BACKTRACK => 4
485 XML_READER_DONE => 5
486 XML_READER_ERROR => 6
487
489 XML::LibXML::Pattern for information about compiled patterns.
490
491 http://xmlsoft.org/html/libxml-xmlreader.html
492
493 http://dotgnu.org/pnetlib-doc/System/Xml/XmlTextReader.html
494
496 Heiko Klein, <H.Klein@gmx.net<gt> and Petr Pajas
497
499 Matt Sergeant, Christian Glahn, Petr Pajas
500
502 1.70
503
505 2001-2007, AxKit.com Ltd.
506
507 2002-2006, Christian Glahn.
508
509 2006-2009, Petr Pajas.
510
511
512
513perl v5.12.0 2009-10-07 XML::LibXML::Reader(3)