1XML::LibXML::Parser(3)User Contributed Perl DocumentationXML::LibXML::Parser(3)
2
3
4
6 XML::LibXML::Parser - Parsing XML Data with XML::LibXML
7
9 use XML::LibXML '1.70';
10
11 # Parser constructor
12
13 $parser = XML::LibXML->new();
14 $parser = XML::LibXML->new(option=>value, ...);
15 $parser = XML::LibXML->new({option=>value, ...});
16
17 # Parsing XML
18
19 $dom = XML::LibXML->load_xml(
20 location => $file_or_url
21 # parser options ...
22 );
23 $dom = XML::LibXML->load_xml(
24 string => $xml_string
25 # parser options ...
26 );
27 $dom = XML::LibXML->load_xml(
28 string => (\$xml_string)
29 # parser options ...
30 );
31 $dom = XML::LibXML->load_xml({
32 IO => $perl_file_handle
33 # parser options ...
34 );
35 $dom = $parser->load_xml(...);
36
37 # Parsing HTML
38
39 $dom = XML::LibXML->load_html(...);
40 $dom = $parser->load_html(...);
41
42 # Parsing well-balanced XML chunks
43
44 $fragment = $parser->parse_balanced_chunk( $wbxmlstring, $encoding );
45
46 # Processing XInclude
47
48 $parser->process_xincludes( $doc );
49 $parser->processXIncludes( $doc );
50
51 # Old-style parser interfaces
52
53 $doc = $parser->parse_file( $xmlfilename );
54 $doc = $parser->parse_fh( $io_fh );
55 $doc = $parser->parse_string( $xmlstring);
56 $doc = $parser->parse_html_file( $htmlfile, \%opts );
57 $doc = $parser->parse_html_fh( $io_fh, \%opts );
58 $doc = $parser->parse_html_string( $htmlstring, \%opts );
59
60 # Push parser
61
62 $parser->parse_chunk($string, $terminate);
63 $parser->init_push();
64 $parser->push(@data);
65 $doc = $parser->finish_push( $recover );
66
67 # Set/query parser options
68
69 $parser->option_exists($name);
70 $parser->get_option($name);
71 $parser->set_option($name,$value);
72 $parser->set_options({$name=>$value,...});
73
74 # XML catalogs
75
76 $parser->load_catalog( $catalog_file );
77
79 An XML document is read into a data structure such as a DOM tree by a
80 piece of software, called a parser. XML::LibXML currently provides four
81 different parser interfaces:
82
83 • A DOM Pull-Parser
84
85 • A DOM Push-Parser
86
87 • A SAX Parser
88
89 • A DOM based SAX Parser.
90
91 Creating a Parser Instance
92 XML::LibXML provides an OO interface to the libxml2 parser functions.
93 Thus you have to create a parser instance before you can parse any XML
94 data.
95
96 new
97 $parser = XML::LibXML->new();
98 $parser = XML::LibXML->new(option=>value, ...);
99 $parser = XML::LibXML->new({option=>value, ...});
100
101 Create a new XML and HTML parser instance. Each parser instance
102 holds default values for various parser options. Optionally, one
103 can pass a hash reference or a list of option => value pairs to set
104 a different default set of options. Unless specified otherwise,
105 the options "load_ext_dtd", and "expand_entities" are set to 1. See
106 "Parser Options" for a list of libxml2 parser's options.
107
108 DOM Parser
109 One of the common parser interfaces of XML::LibXML is the DOM parser.
110 This parser reads XML data into a DOM like data structure, so each tag
111 can get accessed and transformed.
112
113 XML::LibXML's DOM parser is not only capable to parse XML data, but
114 also (strict) HTML files. There are three ways to parse documents - as
115 a string, as a Perl filehandle, or as a filename/URL. The return value
116 from each is a XML::LibXML::Document object, which is a DOM object.
117
118 All of the functions listed below will throw an exception if the
119 document is invalid. To prevent this causing your program exiting, wrap
120 the call in an eval{} block
121
122 load_xml
123 $dom = XML::LibXML->load_xml(
124 location => $file_or_url
125 # parser options ...
126 );
127 $dom = XML::LibXML->load_xml(
128 string => $xml_string
129 # parser options ...
130 );
131 $dom = XML::LibXML->load_xml(
132 string => (\$xml_string)
133 # parser options ...
134 );
135 $dom = XML::LibXML->load_xml({
136 IO => $perl_file_handle
137 # parser options ...
138 );
139 $dom = $parser->load_xml(...);
140
141 This function is available since XML::LibXML 1.70. It provides easy
142 to use interface to the XML parser that parses given file (or non-
143 HTTPS URL), string, or input stream to a DOM tree. The arguments
144 can be passed in a HASH reference or as name => value pairs. The
145 function can be called as a class method or an object method. In
146 both cases it internally creates a new parser instance passing the
147 specified parser options; if called as an object method, it clones
148 the original parser (preserving its settings) and additionally
149 applies the specified options to the new parser. See the
150 constructor "new" and "Parser Options" for more information.
151
152 Note that, due to a limitation in the underlying libxml2 library,
153 this call does not recognize HTTPS-based URLs. (It will treat an
154 HTTPS URL as a filename, likely throwing a "No such file or
155 directory" exception.)
156
157 load_html
158 $dom = XML::LibXML->load_html(...);
159 $dom = $parser->load_html(...);
160
161 This function is available since XML::LibXML 1.70. It has the same
162 usage as "load_xml", providing interface to the HTML parser. See
163 "load_xml" for more information.
164
165 Parsing HTML may cause problems, especially if the ampersand ('&') is
166 used. This is a common problem if HTML code is parsed that contains
167 links to CGI-scripts. Such links cause the parser to throw errors. In
168 such cases libxml2 still parses the entire document as there was no
169 error, but the error causes XML::LibXML to stop the parsing process.
170 However, the document is not lost. Such HTML documents should be
171 parsed using the recover flag. By default recovering is deactivated.
172
173 The functions described above are implemented to parse well formed
174 documents. In some cases a program gets well balanced XML instead of
175 well formed documents (e.g. an XML fragment from a database). With
176 XML::LibXML it is not required to wrap such fragments in the code,
177 because XML::LibXML is capable even to parse well balanced XML
178 fragments.
179
180 parse_balanced_chunk
181 $fragment = $parser->parse_balanced_chunk( $wbxmlstring, $encoding );
182
183 This function parses a well balanced XML string into a
184 XML::LibXML::DocumentFragment. The first arguments contains the
185 input string, the optional second argument can be used to specify
186 character encoding of the input (UTF-8 is assumed by default).
187
188 parse_xml_chunk
189 This is the old name of parse_balanced_chunk(). Because it may
190 causes confusion with the push parser interface, this function
191 should not be used anymore.
192
193 By default XML::LibXML does not process XInclude tags within an XML
194 Document (see options section below). XML::LibXML allows one to post-
195 process a document to expand XInclude tags.
196
197 process_xincludes
198 $parser->process_xincludes( $doc );
199
200 After a document is parsed into a DOM structure, you may want to
201 expand the documents XInclude tags. This function processes the
202 given document structure and expands all XInclude tags (or throws
203 an error) by using the flags and callbacks of the given parser
204 instance.
205
206 Note that the resulting Tree contains some extra nodes (of type
207 XML_XINCLUDE_START and XML_XINCLUDE_END) after successfully
208 processing the document. These nodes indicate where data was
209 included into the original tree. if the document is serialized,
210 these extra nodes will not show up.
211
212 Remember: A Document with processed XIncludes differs from the
213 original document after serialization, because the original
214 XInclude tags will not get restored!
215
216 If the parser flag "expand_xincludes" is set to 1, you need not to
217 post process the parsed document.
218
219 processXIncludes
220 $parser->processXIncludes( $doc );
221
222 This is an alias to process_xincludes, but through a JAVA like
223 function name.
224
225 parse_file
226 $doc = $parser->parse_file( $xmlfilename );
227
228 This function parses an XML document from a file or network;
229 $xmlfilename can be either a filename or a (non-HTTPS) URL. Note
230 that for parsing files, this function is the fastest choice, about
231 6-8 times faster then parse_fh().
232
233 parse_fh
234 $doc = $parser->parse_fh( $io_fh );
235
236 parse_fh() parses a IOREF or a subclass of IO::Handle.
237
238 Because the data comes from an open handle, libxml2's parser does
239 not know about the base URI of the document. To set the base URI
240 one should use parse_fh() as follows:
241
242 my $doc = $parser->parse_fh( $io_fh, $baseuri );
243
244 parse_string
245 $doc = $parser->parse_string( $xmlstring);
246
247 This function is similar to parse_fh(), but it parses an XML
248 document that is available as a single string in memory, or
249 alternatively as a reference to a scalar containing a string.
250 Again, you can pass an optional base URI to the function.
251
252 my $doc = $parser->parse_string( $xmlstring, $baseuri );
253 my $doc = $parser->parse_string(\$xmlstring, $baseuri);
254
255 parse_html_file
256 $doc = $parser->parse_html_file( $htmlfile, \%opts );
257
258 Similar to parse_file() but parses HTML (strict) documents;
259 $htmlfile can be filename or (non-HTTPS) URL.
260
261 An optional second argument can be used to pass some options to the
262 HTML parser as a HASH reference. See options labeled with HTML in
263 "Parser Options".
264
265 parse_html_fh
266 $doc = $parser->parse_html_fh( $io_fh, \%opts );
267
268 Similar to parse_fh() but parses HTML (strict) streams.
269
270 An optional second argument can be used to pass some options to the
271 HTML parser as a HASH reference. See options labeled with HTML in
272 "Parser Options".
273
274 Note: encoding option may not work correctly with this function in
275 libxml2 < 2.6.27 if the HTML file declares charset using a META
276 tag.
277
278 parse_html_string
279 $doc = $parser->parse_html_string( $htmlstring, \%opts );
280
281 Similar to parse_string() but parses HTML (strict) strings.
282
283 An optional second argument can be used to pass some options to the
284 HTML parser as a HASH reference. See options labeled with HTML in
285 "Parser Options".
286
287 Push Parser
288 XML::LibXML provides a push parser interface. Rather than pulling the
289 data from a given source the push parser waits for the data to be
290 pushed into it.
291
292 This allows one to parse large documents without waiting for the parser
293 to finish. The interface is especially useful if a program needs to
294 pre-process the incoming pieces of XML (e.g. to detect document
295 boundaries).
296
297 While XML::LibXML parse_*() functions force the data to be a well-
298 formed XML, the push parser will take any arbitrary string that
299 contains some XML data. The only requirement is that all the pushed
300 strings are together a well formed document. With the push parser
301 interface a program can interrupt the parsing process as required,
302 where the parse_*() functions give not enough flexibility.
303
304 Different to the pull parser implemented in parse_fh() or parse_file(),
305 the push parser is not able to find out about the documents end itself.
306 Thus the calling program needs to indicate explicitly when the parsing
307 is done.
308
309 In XML::LibXML this is done by a single function:
310
311 parse_chunk
312 $parser->parse_chunk($string, $terminate);
313
314 parse_chunk() tries to parse a given chunk of data, which isn't
315 necessarily well balanced data. The function takes two parameters:
316 The chunk of data as a string and optional a termination flag. If
317 the termination flag is set to a true value (e.g. 1), the parsing
318 will be stopped and the resulting document will be returned as the
319 following example describes:
320
321 my $parser = XML::LibXML->new;
322 for my $string ( "<", "foo", ' bar="hello world"', "/>") {
323 $parser->parse_chunk( $string );
324 }
325 my $doc = $parser->parse_chunk("", 1); # terminate the parsing
326
327 Internally XML::LibXML provides three functions that control the push
328 parser process:
329
330 init_push
331 $parser->init_push();
332
333 Initializes the push parser.
334
335 push
336 $parser->push(@data);
337
338 This function pushes the data stored inside the array to libxml2's
339 parser. Each entry in @data must be a normal scalar! This method
340 can be called repeatedly.
341
342 finish_push
343 $doc = $parser->finish_push( $recover );
344
345 This function returns the result of the parsing process. If this
346 function is called without a parameter it will complain about non
347 well-formed documents. If $restore is 1, the push parser can be
348 used to restore broken or non well formed (XML) documents as the
349 following example shows:
350
351 eval {
352 $parser->push( "<foo>", "bar" );
353 $doc = $parser->finish_push(); # will report broken XML
354 };
355 if ( $@ ) {
356 # ...
357 }
358
359 This can be annoying if the closing tag is missed by accident. The
360 following code will restore the document:
361
362 eval {
363 $parser->push( "<foo>", "bar" );
364 $doc = $parser->finish_push(1); # will return the data parsed
365 # unless an error happened
366 };
367
368 print $doc->toString(); # returns "<foo>bar</foo>"
369
370 Of course finish_push() will return nothing if there was no data
371 pushed to the parser before.
372
373 Pull Parser (Reader)
374 XML::LibXML also provides a pull-parser interface similar to the
375 XmlReader interface in .NET. This interface is almost streaming, and is
376 usually faster and simpler to use than SAX. See XML::LibXML::Reader.
377
378 Direct SAX Parser
379 XML::LibXML provides a direct SAX parser in the XML::LibXML::SAX
380 module.
381
382 DOM based SAX Parser
383 XML::LibXML also provides a DOM based SAX parser. The SAX parser is
384 defined in the module XML::LibXML::SAX::Parser. As it is not a stream
385 based parser, it parses documents into a DOM and traverses the DOM tree
386 instead.
387
388 The API of this parser is exactly the same as any other Perl SAX2
389 parser. See XML::SAX::Intro for details.
390
391 Aside from the regular parsing methods, you can access the DOM tree
392 traverser directly, using the generate() method:
393
394 my $doc = build_yourself_a_document();
395 my $saxparser = $XML::LibXML::SAX::Parser->new( ... );
396 $parser->generate( $doc );
397
398 This is useful for serializing DOM trees, for example that you might
399 have done prior processing on, or that you have as a result of XSLT
400 processing.
401
402 WARNING
403
404 This is NOT a streaming SAX parser. As I said above, this parser reads
405 the entire document into a DOM and serialises it. Some people couldn't
406 read that in the paragraph above so I've added this warning. If you
407 want a streaming SAX parser look at the XML::LibXML::SAX man page
408
410 XML::LibXML provides some functions to serialize nodes and documents.
411 The serialization functions are described on the XML::LibXML::Node
412 manpage or the XML::LibXML::Document manpage. XML::LibXML checks three
413 global flags that alter the serialization process:
414
415 • skipXMLDeclaration
416
417 • skipDTD
418
419 • setTagCompression
420
421 of that three functions only setTagCompression is available for all
422 serialization functions.
423
424 Because XML::LibXML does these flags not itself, one has to define them
425 locally as the following example shows:
426
427 local $XML::LibXML::skipXMLDeclaration = 1;
428 local $XML::LibXML::skipDTD = 1;
429 local $XML::LibXML::setTagCompression = 1;
430
431 If skipXMLDeclaration is defined and not '0', the XML declaration is
432 omitted during serialization.
433
434 If skipDTD is defined and not '0', an existing DTD would not be
435 serialized with the document.
436
437 If setTagCompression is defined and not '0' empty tags are displayed as
438 open and closing tags rather than the shortcut. For example the empty
439 tag foo will be rendered as <foo></foo> rather than <foo/>.
440
442 Handling of libxml2 parser options has been unified and improved in
443 XML::LibXML 1.70. You can now set default options for a particular
444 parser instance by passing them to the constructor as
445 "XML::LibXML->new({name=>value, ...})" or
446 "XML::LibXML->new(name=>value,...)". The options can be queried and
447 changed using the following methods (pre-1.70 interfaces such as
448 "$parser->load_ext_dtd(0)" also exist, see below):
449
450 option_exists
451 $parser->option_exists($name);
452
453 Returns 1 if the current XML::LibXML version supports the option
454 $name, otherwise returns 0 (note that this does not necessarily
455 mean that the option is supported by the underlying libxml2
456 library).
457
458 get_option
459 $parser->get_option($name);
460
461 Returns the current value of the parser option $name.
462
463 set_option
464 $parser->set_option($name,$value);
465
466 Sets option $name to value $value.
467
468 set_options
469 $parser->set_options({$name=>$value,...});
470
471 Sets multiple parsing options at once.
472
473 IMPORTANT NOTE: This documentation reflects the parser flags available
474 in libxml2 2.7.3. Some options have no effect if an older version of
475 libxml2 is used.
476
477 Each of the flags listed below is labeled
478
479 /parser/
480 if it can be used with a "XML::LibXML" parser object (i.e. passed
481 to "XML::LibXML->new", "XML::LibXML->set_option", etc.)
482
483 /html/
484 if it can be used passed to the "parse_html_*" methods
485
486 /reader/
487 if it can be used with the "XML::LibXML::Reader".
488
489 Unless specified otherwise, the default for boolean valued options is 0
490 (false).
491
492 The available options are:
493
494 URI /parser, html, reader/
495
496 In case of parsing strings or file handles, XML::LibXML doesn't
497 know about the base uri of the document. To make relative
498 references such as XIncludes work, one has to set a base URI, that
499 is then used for the parsed document.
500
501 line_numbers
502 /parser, html, reader/
503
504 If this option is activated, libxml2 will store the line number of
505 each element node in the parsed document. The line number can be
506 obtained using the "line_number()" method of the
507 "XML::LibXML::Node" class (for non-element nodes this may report
508 the line number of the containing element). The line numbers are
509 also used for reporting positions of validation errors.
510
511 IMPORTANT: Due to limitations in the libxml2 library line numbers
512 greater than 65535 will be returned as 65535. Unfortunately, this
513 is a long and sad story, please see
514 <http://bugzilla.gnome.org/show_bug.cgi?id=325533> for more
515 details.
516
517 encoding
518 /html/
519
520 character encoding of the input
521
522 recover
523 /parser, html, reader/
524
525 recover from errors; possible values are 0, 1, and 2
526
527 A true value turns on recovery mode which allows one to parse
528 broken XML or HTML data. The recovery mode allows the parser to
529 return the successfully parsed portion of the input document. This
530 is useful for almost well-formed documents, where for example a
531 closing tag is missing somewhere. Still, XML::LibXML will only
532 parse until the first fatal (non-recoverable) error occurs,
533 reporting recoverable parsing errors as warnings. To suppress even
534 these warnings, use recover=>2.
535
536 Note that validation is switched off automatically in recovery
537 mode.
538
539 expand_entities
540 /parser, reader/
541
542 substitute entities; possible values are 0 and 1; default is 1
543
544 Note that although this flag disables entity substitution, it does
545 not prevent the parser from loading external entities; when
546 substitution of an external entity is disabled, the entity will be
547 represented in the document tree by an XML_ENTITY_REF_NODE node
548 whose subtree will be the content obtained by parsing the external
549 resource; Although this nesting is visible from the DOM it is
550 transparent to XPath data model, so it is possible to match nodes
551 in an unexpanded entity by the same XPath expression as if the
552 entity were expanded. See also ext_ent_handler.
553
554 ext_ent_handler
555 /parser/
556
557 Provide a custom external entity handler to be used when
558 expand_entities is set to 1. Possible value is a subroutine
559 reference.
560
561 This feature does not work properly in libxml2 < 2.6.27!
562
563 The subroutine provided is called whenever the parser needs to
564 retrieve the content of an external entity. It is called with two
565 arguments: the system ID (URI) and the public ID. The value
566 returned by the subroutine is parsed as the content of the entity.
567
568 This method can be used to completely disable entity loading, e.g.
569 to prevent exploits of the type described at
570 (<http://searchsecuritychannel.techtarget.com/generic/0,295582,sid97_gci1304703,00.html>),
571 where a service is tricked to expose its private data by letting it
572 parse a remote file (RSS feed) that contains an entity reference to
573 a local file (e.g. "/etc/fstab").
574
575 A more granular solution to this problem, however, is provided by
576 custom URL resolvers, as in
577
578 my $c = XML::LibXML::InputCallback->new();
579 sub match { # accept file:/ URIs except for XML catalogs in /etc/xml/
580 my ($uri) = @_;
581 return ($uri=~m{^file:/}
582 and $uri !~ m{^file:///etc/xml/})
583 ? 1 : 0;
584 }
585 $c->register_callbacks([ \&match, sub{}, sub{}, sub{} ]);
586 $parser->input_callbacks($c);
587
588 load_ext_dtd
589 /parser, reader/
590
591 load the external DTD subset while parsing; possible values are 0
592 and 1. Unless specified, XML::LibXML sets this option to 1.
593
594 This flag is also required for DTD Validation, to provide complete
595 attribute, and to expand entities, regardless if the document has
596 an internal subset. Thus switching off external DTD loading, will
597 disable entity expansion, validation, and complete attributes on
598 internal subsets as well.
599
600 complete_attributes
601 /parser, reader/
602
603 create default DTD attributes; possible values are 0 and 1
604
605 validation
606 /parser, reader/
607
608 validate with the DTD; possible values are 0 and 1
609
610 suppress_errors
611 /parser, html, reader/
612
613 suppress error reports; possible values are 0 and 1
614
615 suppress_warnings
616 /parser, html, reader/
617
618 suppress warning reports; possible values are 0 and 1
619
620 pedantic_parser
621 /parser, html, reader/
622
623 pedantic error reporting; possible values are 0 and 1
624
625 no_blanks
626 /parser, html, reader/
627
628 remove blank nodes; possible values are 0 and 1
629
630 no_defdtd
631 /html/
632
633 do not add a default DOCTYPE; possible values are 0 and 1
634
635 the default is (0) to add a DTD when the input html lacks one
636
637 expand_xinclude or xinclude
638 /parser, reader/
639
640 Implement XInclude substitution; possible values are 0 and 1
641
642 Expands XInclude tags immediately while parsing the document. Note
643 that the parser will use the URI resolvers installed via
644 "XML::LibXML::InputCallback" to parse the included document (if
645 any).
646
647 no_xinclude_nodes
648 /parser, reader/
649
650 do not generate XINCLUDE START/END nodes; possible values are 0 and
651 1
652
653 no_network
654 /parser, html, reader/
655
656 Forbid network access; possible values are 0 and 1
657
658 If set to true, all attempts to fetch non-local resources (such as
659 DTD or external entities) will fail (unless custom callbacks are
660 defined).
661
662 It may be necessary to use the flag "recover" for processing
663 documents requiring such resources while networking is off.
664
665 clean_namespaces
666 /parser, reader/
667
668 remove redundant namespaces declarations during parsing; possible
669 values are 0 and 1.
670
671 no_cdata
672 /parser, html, reader/
673
674 merge CDATA as text nodes; possible values are 0 and 1
675
676 no_basefix
677 /parser, reader/
678
679 not fixup XINCLUDE xml#base URIS; possible values are 0 and 1
680
681 huge
682 /parser, html, reader/
683
684 relax any hardcoded limit from the parser; possible values are 0
685 and 1. Unless specified, XML::LibXML sets this option to 0.
686
687 Note: the default value for this option was changed to protect
688 against denial of service through entity expansion attacks. Before
689 enabling the option ensure you have taken alternative measures to
690 protect your application against this type of attack.
691
692 gdome
693 /parser/
694
695 THIS OPTION IS EXPERIMENTAL!
696
697 Although quite powerful, XML::LibXML's DOM implementation is
698 incomplete with respect to the DOM level 2 or level 3
699 specifications. XML::GDOME is based on libxml2 as well, and
700 provides a rather complete DOM implementation by wrapping libgdome.
701 This flag allows you to make use of XML::LibXML's full parser
702 options and XML::GDOME's DOM implementation at the same time.
703
704 To make use of this function, one has to install libgdome and
705 configure XML::LibXML to use this library. For this you need to
706 rebuild XML::LibXML!
707
708 Note: this feature was not seriously tested in recent XML::LibXML
709 releases.
710
711 For compatibility with XML::LibXML versions prior to 1.70, the
712 following methods are also supported for querying and setting the
713 corresponding parser options (if called without arguments, the methods
714 return the current value of the corresponding parser options; with an
715 argument sets the option to a given value):
716
717 $parser->validation();
718 $parser->recover();
719 $parser->pedantic_parser();
720 $parser->line_numbers();
721 $parser->load_ext_dtd();
722 $parser->complete_attributes();
723 $parser->expand_xinclude();
724 $parser->gdome_dom();
725 $parser->clean_namespaces();
726 $parser->no_network();
727
728 The following obsolete methods trigger parser options in some special
729 way:
730
731 recover_silently
732 $parser->recover_silently(1);
733
734 If called without an argument, returns true if the current value of
735 the "recover" parser option is 2 and returns false otherwise. With
736 a true argument sets the "recover" parser option to 2; with a false
737 argument sets the "recover" parser option to 0.
738
739 expand_entities
740 $parser->expand_entities(0);
741
742 Get/set the "expand_entities" option. If called with a true
743 argument, also turns the "load_ext_dtd" option to 1.
744
745 keep_blanks
746 $parser->keep_blanks(0);
747
748 This is actually the opposite of the "no_blanks" parser option. If
749 used without an argument retrieves negated value of "no_blanks". If
750 used with an argument sets "no_blanks" to the opposite value.
751
752 base_uri
753 $parser->base_uri( $your_base_uri );
754
755 Get/set the "URI" option.
756
758 "libxml2" supports XML catalogs. Catalogs are used to map remote
759 resources to their local copies. Using catalogs can speed up parsing
760 processes if many external resources from remote addresses are loaded
761 into the parsed documents (such as DTDs or XIncludes).
762
763 Note that libxml2 has a global pool of loaded catalogs, so if you apply
764 the method "load_catalog" to one parser instance, all parser instances
765 will start using the catalog (in addition to other previously loaded
766 catalogs).
767
768 Note also that catalogs are not used when a custom external entity
769 handler is specified. At the current state it is not possible to make
770 use of both types of resolving systems at the same time.
771
772 load_catalog
773 $parser->load_catalog( $catalog_file );
774
775 Loads the XML catalog file $catalog_file.
776
777 # Global external entity loader (similar to ext_ent_handler option
778 # but this works really globally, also in XML::LibXSLT include etc..)
779
780 XML::LibXML::externalEntityLoader(\&my_loader);
781
783 XML::LibXML throws exceptions during parsing, validation or XPath
784 processing (and some other occasions). These errors can be caught by
785 using eval blocks. The error is stored in $@. There are two
786 implementations: the old one throws $@ which is just a message string,
787 in the new one $@ is an object from the class XML::LibXML::Error; this
788 class overrides the operator "" so that when printed, the object
789 flattens to the usual error message.
790
791 XML::LibXML throws errors as they occur. This is a very common
792 misunderstanding in the use of XML::LibXML. If the eval is omitted,
793 XML::LibXML will always halt your script by "croaking" (see Carp man
794 page for details).
795
796 Also note that an increasing number of functions throw errors if bad
797 data is passed as arguments. If you cannot assure valid data passed to
798 XML::LibXML you should eval these functions.
799
800 Note: since version 1.59, get_last_error() is no longer available in
801 XML::LibXML for thread-safety reasons.
802
804 Matt Sergeant, Christian Glahn, Petr Pajas
805
807 2.0207
808
810 2001-2007, AxKit.com Ltd.
811
812 2002-2006, Christian Glahn.
813
814 2006-2009, Petr Pajas.
815
817 This program is free software; you can redistribute it and/or modify it
818 under the same terms as Perl itself.
819
820
821
822perl v5.34.0 2021-07-23 XML::LibXML::Parser(3)