1XML::LibXML::Reader(3)User Contributed Perl DocumentationXML::LibXML::Reader(3)
2
3
4
6 XML::LibXML::Reader - XML::LibXML::Reader - interface to libxml2 pull
7 parser
8
10 use XML::LibXML::Reader;
11
12
13
14 my $reader = XML::LibXML::Reader->new(location => "file.xml")
15 or die "cannot read file.xml\n";
16 while ($reader->read) {
17 processNode($reader);
18 }
19
20
21
22 sub processNode {
23 my $reader = shift;
24 printf "%d %d %s %d\n", ($reader->depth,
25 $reader->nodeType,
26 $reader->name,
27 $reader->isEmptyElement);
28 }
29
30 or
31
32 my $reader = XML::LibXML::Reader->new(location => "file.xml")
33 or die "cannot read file.xml\n";
34 $reader->preservePattern('//table/tr');
35 $reader->finish;
36 print $reader->document->toString(1);
37
39 This is a perl interface to libxml2's pull-parser implementation
40 xmlTextReader http://xmlsoft.org/html/libxml-xmlreader.html. This
41 feature requires at least libxml2-2.6.21. Pull-parsers (such as StAX in
42 Java, or XmlReader in C#) use an iterator approach to parse XML
43 documents. They are easier to program than event-based parser (SAX) and
44 much more lightweight than tree-based parser (DOM), which load the
45 complete tree into memory.
46
47 The Reader acts as a cursor going forward on the document stream and
48 stopping at each node on the way. At every point, the DOM-like methods
49 of the Reader object allow one to examine the current node (name,
50 namespace, attributes, etc.)
51
52 The user's code keeps control of the progress and simply calls the
53 "read()" function repeatedly to progress to the next node in the
54 document order. Other functions provide means for skipping complete
55 sub-trees, or nodes until a specific element, etc.
56
57 At every time, only a very limited portion of the document is kept in
58 the memory, which makes the API more memory-efficient than using DOM.
59 However, it is also possible to mix Reader with DOM. At every point the
60 user may copy the current node (optionally expanded into a complete
61 sub-tree) from the processed document to another DOM tree, or to
62 instruct the Reader to collect sub-document in form of a DOM tree
63 consisting of selected nodes.
64
65 Reader API also supports namespaces, xml:base, entity handling, and DTD
66 validation. Schema and RelaxNG validation support will probably be
67 added in some later revision of the Perl interface.
68
69 The naming of methods compared to libxml2 and C# XmlTextReader has been
70 changed slightly to match the conventions of XML::LibXML. Some
71 functions have been changed or added with respect to the C interface.
72
74 Depending on the XML source, the Reader object can be created with
75 either of:
76
77 my $reader = XML::LibXML::Reader->new( location => "file.xml", ... );
78 my $reader = XML::LibXML::Reader->new( string => $xml_string, ... );
79 my $reader = XML::LibXML::Reader->new( IO => $file_handle, ... );
80 my $reader = XML::LibXML::Reader->new( FD => fileno(STDIN), ... );
81 my $reader = XML::LibXML::Reader->new( DOM => $dom, ... );
82
83 where ... are (optional) reader options described below in "Reader
84 options" or various parser options described in XML::LibXML::Parser.
85 The constructor recognizes the following XML sources:
86
87 Source specification
88 location
89 Read XML from a local file or (non-HTTPS) URL.
90
91 string
92 Read XML from a string.
93
94 IO Read XML a Perl IO filehandle.
95
96 FD Read XML from a file descriptor (bypasses Perl I/O layer, only
97 applicable to filehandles for regular files or pipes). Possibly
98 faster than IO.
99
100 DOM Use reader API to walk through a pre-parsed XML::LibXML::Document.
101
102 Reader options
103 encoding => $encoding
104 override document encoding.
105
106 RelaxNG => $rng_schema
107 can be used to pass either a XML::LibXML::RelaxNG object or a
108 filename or (non-HTTPS) URL of a RelaxNG schema to the constructor.
109 The schema is then used to validate the document as it is
110 processed.
111
112 Schema => $xsd_schema
113 can be used to pass either a XML::LibXML::Schema object or a
114 filename or (non-HTTPS) URL of a W3C XSD schema to the constructor.
115 The schema is then used to validate the document as it is
116 processed.
117
118 ... the reader further supports various parser options described in
119 XML::LibXML::Parser (specifically those labeled by /reader/).
120
122 read ()
123 Moves the position to the next node in the stream, exposing its
124 properties.
125
126 Returns 1 if the node was read successfully, 0 if there is no more
127 nodes to read, or -1 in case of error
128
129 readAttributeValue ()
130 Parses an attribute value into one or more Text and EntityReference
131 nodes.
132
133 Returns 1 in case of success, 0 if the reader was not positioned on
134 an attribute node or all the attribute values have been read, or -1
135 in case of error.
136
137 readState ()
138 Gets the read state of the reader. Returns the state value, or -1
139 in case of error. The module exports constants for the Reader
140 states, see STATES below.
141
142 depth ()
143 The depth of the node in the tree, starts at 0 for the root node.
144
145 next ()
146 Skip to the node following the current one in the document order
147 while avoiding the sub-tree if any. Returns 1 if the node was read
148 successfully, 0 if there is no more nodes to read, or -1 in case of
149 error.
150
151 nextElement (localname?,nsURI?)
152 Skip nodes following the current one in the document order until a
153 specific element is reached. The element's name must be equal to a
154 given localname if defined, and its namespace must equal to a given
155 nsURI if defined. Either of the arguments can be undefined (or
156 omitted, in case of the latter or both).
157
158 Returns 1 if the element was found, 0 if there is no more nodes to
159 read, or -1 in case of error.
160
161 nextPatternMatch (compiled_pattern)
162 Skip nodes following the current one in the document order until an
163 element matching a given compiled pattern is reached. See
164 XML::LibXML::Pattern for information on compiled patterns. See also
165 the "matchesPattern" method.
166
167 Returns 1 if the element was found, 0 if there is no more nodes to
168 read, or -1 in case of error.
169
170 skipSiblings ()
171 Skip all nodes on the same or lower level until the first node on a
172 higher level is reached. In particular, if the current node occurs
173 in an element, the reader stops at the end tag of the parent
174 element, otherwise it stops at a node immediately following the
175 parent node.
176
177 Returns 1 if successful, 0 if end of the document is reached, or -1
178 in case of error.
179
180 nextSibling ()
181 It skips to the node following the current one in the document
182 order while avoiding the sub-tree if any.
183
184 Returns 1 if the node was read successfully, 0 if there is no more
185 nodes to read, or -1 in case of error
186
187 nextSiblingElement (name?,nsURI?)
188 Like nextElement but only processes sibling elements of the current
189 node (moving forward using "nextSibling ()" rather than "read ()",
190 internally).
191
192 Returns 1 if the element was found, 0 if there is no more sibling
193 nodes, or -1 in case of error.
194
195 finish ()
196 Skip all remaining nodes in the document, reaching end of the
197 document.
198
199 Returns 1 if successful, 0 in case of error.
200
201 close ()
202 This method releases any resources allocated by the current
203 instance and closes any underlying input. It returns 0 on failure
204 and 1 on success. This method is automatically called by the
205 destructor when the reader is forgotten, therefore you do not have
206 to call it directly.
207
209 name ()
210 Returns the qualified name of the current node, equal to
211 (Prefix:)LocalName.
212
213 nodeType ()
214 Returns the type of the current node. See NODE TYPES below.
215
216 localName ()
217 Returns the local name of the node.
218
219 prefix ()
220 Returns the prefix of the namespace associated with the node.
221
222 namespaceURI ()
223 Returns the URI defining the namespace associated with the node.
224
225 isEmptyElement ()
226 Check if the current node is empty, this is a bit bizarre in the
227 sense that <a/> will be considered empty while <a></a> will not.
228
229 hasValue ()
230 Returns true if the node can have a text value.
231
232 value ()
233 Provides the text value of the node if present or undef if not
234 available.
235
236 readInnerXml ()
237 Reads the contents of the current node, including child nodes and
238 markup. Returns a string containing the XML of the node's content,
239 or undef if the current node is neither an element nor attribute,
240 or has no child nodes.
241
242 readOuterXml ()
243 Reads the contents of the current node, including child nodes and
244 markup.
245
246 Returns a string containing the XML of the node including its
247 content, or undef if the current node is neither an element nor
248 attribute.
249
250 nodePath()
251 Returns a canonical location path to the current element from the
252 root node to the current node. Namespaced elements are matched by
253 '*', because there is no way to declare prefixes within XPath
254 patterns. Unlike "XML::LibXML::Node::nodePath()", this function
255 does not provide sibling counts (i.e. instead of e.g. '/a/b[1]' and
256 '/a/b[2]' you get '/a/b' for both matches).
257
258 matchesPattern(compiled_pattern)
259 Returns a true value if the current node matches a compiled
260 pattern. See XML::LibXML::Pattern for information on compiled
261 patterns. See also the "nextPatternMatch" method.
262
264 document ()
265 Provides access to the document tree built by the reader. This
266 function can be used to collect the preserved nodes (see
267 "preserveNode()" and preservePattern).
268
269 CAUTION: Never use this function to modify the tree unless reading
270 of the whole document is completed!
271
272 copyCurrentNode (deep)
273 This function is similar a DOM function "copyNode()". It returns a
274 copy of the currently processed node as a corresponding DOM object.
275 Use deep = 1 to obtain the full sub-tree.
276
277 preserveNode ()
278 This tells the XML Reader to preserve the current node in the
279 document tree. A document tree consisting of the preserved nodes
280 and their content can be obtained using the method "document()"
281 once parsing is finished.
282
283 Returns the node or NULL in case of error.
284
285 preservePattern (pattern,\%ns_map)
286 This tells the XML Reader to preserve all nodes matched by the
287 pattern (which is a streaming XPath subset). A document tree
288 consisting of the preserved nodes and their content can be obtained
289 using the method "document()" once parsing is finished.
290
291 An optional second argument can be used to provide a HASH reference
292 mapping prefixes used by the XPath to namespace URIs.
293
294 The XPath subset available with this function is described at
295
296 http://www.w3.org/TR/xmlschema-1/#Selector
297
298 and matches the production
299
300 Path ::= ('.//')? ( Step '/' )* ( Step | '@' NameTest )
301
302 Returns a positive number in case of success and -1 in case of
303 error
304
306 attributeCount ()
307 Provides the number of attributes of the current node.
308
309 hasAttributes ()
310 Whether the node has attributes.
311
312 getAttribute (name)
313 Provides the value of the attribute with the specified qualified
314 name.
315
316 Returns a string containing the value of the specified attribute,
317 or undef in case of error.
318
319 getAttributeNs (localName, namespaceURI)
320 Provides the value of the specified attribute.
321
322 Returns a string containing the value of the specified attribute,
323 or undef in case of error.
324
325 getAttributeNo (no)
326 Provides the value of the attribute with the specified index
327 relative to the containing element.
328
329 Returns a string containing the value of the specified attribute,
330 or undef in case of error.
331
332 isDefault ()
333 Returns true if the current attribute node was generated from the
334 default value defined in the DTD.
335
336 moveToAttribute (name)
337 Moves the position to the attribute with the specified local name
338 and namespace URI.
339
340 Returns 1 in case of success, -1 in case of error, 0 if not found
341
342 moveToAttributeNo (no)
343 Moves the position to the attribute with the specified index
344 relative to the containing element.
345
346 Returns 1 in case of success, -1 in case of error, 0 if not found
347
348 moveToAttributeNs (localName,namespaceURI)
349 Moves the position to the attribute with the specified local name
350 and namespace URI.
351
352 Returns 1 in case of success, -1 in case of error, 0 if not found
353
354 moveToFirstAttribute ()
355 Moves the position to the first attribute associated with the
356 current node.
357
358 Returns 1 in case of success, -1 in case of error, 0 if not found
359
360 moveToNextAttribute ()
361 Moves the position to the next attribute associated with the
362 current node.
363
364 Returns 1 in case of success, -1 in case of error, 0 if not found
365
366 moveToElement ()
367 Moves the position to the node that contains the current attribute
368 node.
369
370 Returns 1 in case of success, -1 in case of error, 0 if not moved
371
372 isNamespaceDecl ()
373 Determine whether the current node is a namespace declaration
374 rather than a regular attribute.
375
376 Returns 1 if the current node is a namespace declaration, 0 if it
377 is a regular attribute or other type of node, or -1 in case of
378 error.
379
381 lookupNamespace (prefix)
382 Resolves a namespace prefix in the scope of the current element.
383
384 Returns a string containing the namespace URI to which the prefix
385 maps or undef in case of error.
386
387 encoding ()
388 Returns a string containing the encoding of the document or undef
389 in case of error.
390
391 standalone ()
392 Determine the standalone status of the document being read. Returns
393 1 if the document was declared to be standalone, 0 if it was
394 declared to be not standalone, or -1 if the document did not
395 specify its standalone status or in case of error.
396
397 xmlVersion ()
398 Determine the XML version of the document being read. Returns a
399 string containing the XML version of the document or undef in case
400 of error.
401
402 baseURI ()
403 Returns the base URI of a given node.
404
405 isValid ()
406 Retrieve the validity status from the parser.
407
408 Returns 1 if valid, 0 if no, and -1 in case of error.
409
410 xmlLang ()
411 The xml:lang scope within which the node resides.
412
413 lineNumber ()
414 Provide the line number of the current parsing point.
415
416 columnNumber ()
417 Provide the column number of the current parsing point.
418
419 byteConsumed ()
420 This function provides the current index of the parser relative to
421 the start of the current entity. This function is computed in bytes
422 from the beginning starting at zero and finishing at the size in
423 bytes of the file if parsing a file. The function is of constant
424 cost if the input is UTF-8 but can be costly if run on non-UTF-8
425 input.
426
427 setParserProp (prop => value, ...)
428 Change the parser processing behaviour by changing some of its
429 internal properties. The following properties are available with
430 this function: ``load_ext_dtd'', ``complete_attributes'',
431 ``validation'', ``expand_entities''.
432
433 Since some of the properties can only be changed before any read
434 has been done, it is best to set the parsing properties at the
435 constructor.
436
437 Returns 0 if the call was successful, or -1 in case of error
438
439 getParserProp (prop)
440 Get value of an parser internal property. The following property
441 names can be used: ``load_ext_dtd'', ``complete_attributes'',
442 ``validation'', ``expand_entities''.
443
444 Returns the value, usually 0 or 1, or -1 in case of error.
445
447 XML::LibXML takes care of the reader object destruction when the last
448 reference to the reader object goes out of scope. The document tree is
449 preserved, though, if either of $reader->document or
450 $reader->preserveNode was used and references to the document tree
451 exist.
452
454 The reader interface provides the following constants for node types
455 (the constant symbols are exported by default or if tag ":types" is
456 used).
457
458 XML_READER_TYPE_NONE => 0
459 XML_READER_TYPE_ELEMENT => 1
460 XML_READER_TYPE_ATTRIBUTE => 2
461 XML_READER_TYPE_TEXT => 3
462 XML_READER_TYPE_CDATA => 4
463 XML_READER_TYPE_ENTITY_REFERENCE => 5
464 XML_READER_TYPE_ENTITY => 6
465 XML_READER_TYPE_PROCESSING_INSTRUCTION => 7
466 XML_READER_TYPE_COMMENT => 8
467 XML_READER_TYPE_DOCUMENT => 9
468 XML_READER_TYPE_DOCUMENT_TYPE => 10
469 XML_READER_TYPE_DOCUMENT_FRAGMENT => 11
470 XML_READER_TYPE_NOTATION => 12
471 XML_READER_TYPE_WHITESPACE => 13
472 XML_READER_TYPE_SIGNIFICANT_WHITESPACE => 14
473 XML_READER_TYPE_END_ELEMENT => 15
474 XML_READER_TYPE_END_ENTITY => 16
475 XML_READER_TYPE_XML_DECLARATION => 17
476
478 The following constants represent the values returned by "readState()".
479 They are exported by default, or if tag ":states" is used:
480
481 XML_READER_NONE => -1
482 XML_READER_START => 0
483 XML_READER_ELEMENT => 1
484 XML_READER_END => 2
485 XML_READER_EMPTY => 3
486 XML_READER_BACKTRACK => 4
487 XML_READER_DONE => 5
488 XML_READER_ERROR => 6
489
491 XML::LibXML::Pattern for information about compiled patterns.
492
493 http://xmlsoft.org/html/libxml-xmlreader.html
494
495 http://dotgnu.org/pnetlib-doc/System/Xml/XmlTextReader.html
496
498 Heiko Klein, <H.Klein@gmx.net<gt> and Petr Pajas
499
501 Matt Sergeant, Christian Glahn, Petr Pajas
502
504 2.0207
505
507 2001-2007, AxKit.com Ltd.
508
509 2002-2006, Christian Glahn.
510
511 2006-2009, Petr Pajas.
512
514 This program is free software; you can redistribute it and/or modify it
515 under the same terms as Perl itself.
516
517
518
519perl v5.34.0 2021-07-23 XML::LibXML::Reader(3)