1XML::LibXML::Parser(3)User Contributed Perl DocumentationXML::LibXML::Parser(3)
2
3
4

NAME

6       XML::LibXML::Parser - Parsing XML Data with XML::LibXML
7

SYNOPSIS

9         use XML::LibXML '1.70';
10
11         # Parser constructor
12
13         $parser = XML::LibXML->new();
14         $parser = XML::LibXML->new(option=>value, ...);
15         $parser = XML::LibXML->new({option=>value, ...});
16
17         # Parsing XML
18
19         $dom = XML::LibXML->load_xml(
20             location => $file_or_url
21             # parser options ...
22           );
23         $dom = XML::LibXML->load_xml(
24             string => $xml_string
25             # parser options ...
26           );
27         $dom = XML::LibXML->load_xml(
28             string => (\$xml_string)
29             # parser options ...
30           );
31         $dom = XML::LibXML->load_xml({
32             IO => $perl_file_handle
33             # parser options ...
34           );
35         $dom = $parser->load_xml(...);
36
37         # Parsing HTML
38
39         $dom = XML::LibXML->load_html(...);
40         $dom = $parser->load_html(...);
41
42         # Parsing well-balanced XML chunks
43
44         $fragment = $parser->parse_balanced_chunk( $wbxmlstring, $encoding );
45
46         # Processing XInclude
47
48         $parser->process_xincludes( $doc );
49         $parser->processXIncludes( $doc );
50
51         # Old-style parser interfaces
52
53         $doc = $parser->parse_file( $xmlfilename );
54         $doc = $parser->parse_fh( $io_fh );
55         $doc = $parser->parse_string( $xmlstring);
56         $doc = $parser->parse_html_file( $htmlfile, \%opts );
57         $doc = $parser->parse_html_fh( $io_fh, \%opts );
58         $doc = $parser->parse_html_string( $htmlstring, \%opts );
59
60         # Push parser
61
62         $parser->parse_chunk($string, $terminate);
63         $parser->init_push();
64         $parser->push(@data);
65         $doc = $parser->finish_push( $recover );
66
67         # Set/query parser options
68
69         $parser->option_exists($name);
70         $parser->get_option($name);
71         $parser->set_option($name,$value);
72         $parser->set_options({$name=>$value,...});
73
74         # XML catalogs
75
76         $parser->load_catalog( $catalog_file );
77

PARSING

79       An XML document is read into a data structure such as a DOM tree by a
80       piece of software, called a parser. XML::LibXML currently provides four
81       different parser interfaces:
82
83       ·   A DOM Pull-Parser
84
85       ·   A DOM Push-Parser
86
87       ·   A SAX Parser
88
89       ·   A DOM based SAX Parser.
90
91   Creating a Parser Instance
92       XML::LibXML provides an OO interface to the libxml2 parser functions.
93       Thus you have to create a parser instance before you can parse any XML
94       data.
95
96       new
97             $parser = XML::LibXML->new();
98             $parser = XML::LibXML->new(option=>value, ...);
99             $parser = XML::LibXML->new({option=>value, ...});
100
101           Create a new XML and HTML parser instance. Each parser instance
102           holds default values for various parser options. Optionally, one
103           can pass a hash reference or a list of option => value pairs to set
104           a different default set of options.  Unless specified otherwise,
105           the options "load_ext_dtd", and "expand_entities" are set to 1. See
106           "Parser Options" for a list of libxml2 parser's options.
107
108   DOM Parser
109       One of the common parser interfaces of XML::LibXML is the DOM parser.
110       This parser reads XML data into a DOM like data structure, so each tag
111       can get accessed and transformed.
112
113       XML::LibXML's DOM parser is not only capable to parse XML data, but
114       also (strict) HTML files. There are three ways to parse documents - as
115       a string, as a Perl filehandle, or as a filename/URL. The return value
116       from each is a XML::LibXML::Document object, which is a DOM object.
117
118       All of the functions listed below will throw an exception if the
119       document is invalid. To prevent this causing your program exiting, wrap
120       the call in an eval{} block
121
122       load_xml
123             $dom = XML::LibXML->load_xml(
124                 location => $file_or_url
125                 # parser options ...
126               );
127             $dom = XML::LibXML->load_xml(
128                 string => $xml_string
129                 # parser options ...
130               );
131             $dom = XML::LibXML->load_xml(
132                 string => (\$xml_string)
133                 # parser options ...
134               );
135             $dom = XML::LibXML->load_xml({
136                 IO => $perl_file_handle
137                 # parser options ...
138               );
139             $dom = $parser->load_xml(...);
140
141           This function is available since XML::LibXML 1.70. It provides easy
142           to use interface to the XML parser that parses given file (or non-
143           HTTPS URL), string, or input stream to a DOM tree. The arguments
144           can be passed in a HASH reference or as name => value pairs. The
145           function can be called as a class method or an object method. In
146           both cases it internally creates a new parser instance passing the
147           specified parser options; if called as an object method, it clones
148           the original parser (preserving its settings) and additionally
149           applies the specified options to the new parser. See the
150           constructor "new" and "Parser Options" for more information.
151
152           Note that, due to a limitation in the underlying libxml2 library,
153           this call does not recognize HTTPS-based URLs. (It will treat an
154           HTTPS URL as a filename, likely throwing a "No such file or
155           directory" exception.)
156
157       load_html
158             $dom = XML::LibXML->load_html(...);
159             $dom = $parser->load_html(...);
160
161           This function is available since XML::LibXML 1.70. It has the same
162           usage as "load_xml", providing interface to the HTML parser. See
163           "load_xml" for more information.
164
165       Parsing HTML may cause problems, especially if the ampersand ('&') is
166       used.  This is a common problem if HTML code is parsed that contains
167       links to CGI-scripts. Such links cause the parser to throw errors. In
168       such cases libxml2 still parses the entire document as there was no
169       error, but the error causes XML::LibXML to stop the parsing process.
170       However, the document is not lost.  Such HTML documents should be
171       parsed using the recover flag. By default recovering is deactivated.
172
173       The functions described above are implemented to parse well formed
174       documents.  In some cases a program gets well balanced XML instead of
175       well formed documents (e.g. an XML fragment from a database). With
176       XML::LibXML it is not required to wrap such fragments in the code,
177       because XML::LibXML is capable even to parse well balanced XML
178       fragments.
179
180       parse_balanced_chunk
181             $fragment = $parser->parse_balanced_chunk( $wbxmlstring, $encoding );
182
183           This function parses a well balanced XML string into a
184           XML::LibXML::DocumentFragment. The first arguments contains the
185           input string, the optional second argument can be used to specify
186           character encoding of the input (UTF-8 is assumed by default).
187
188       parse_xml_chunk
189           This is the old name of parse_balanced_chunk(). Because it may
190           causes confusion with the push parser interface, this function
191           should not be used anymore.
192
193       By default XML::LibXML does not process XInclude tags within an XML
194       Document (see options section below). XML::LibXML allows one to post-
195       process a document to expand XInclude tags.
196
197       process_xincludes
198             $parser->process_xincludes( $doc );
199
200           After a document is parsed into a DOM structure, you may want to
201           expand the documents XInclude tags. This function processes the
202           given document structure and expands all XInclude tags (or throws
203           an error) by using the flags and callbacks of the given parser
204           instance.
205
206           Note that the resulting Tree contains some extra nodes (of type
207           XML_XINCLUDE_START and XML_XINCLUDE_END) after successfully
208           processing the document. These nodes indicate where data was
209           included into the original tree.  if the document is serialized,
210           these extra nodes will not show up.
211
212           Remember: A Document with processed XIncludes differs from the
213           original document after serialization, because the original
214           XInclude tags will not get restored!
215
216           If the parser flag "expand_xincludes" is set to 1, you need not to
217           post process the parsed document.
218
219       processXIncludes
220             $parser->processXIncludes( $doc );
221
222           This is an alias to process_xincludes, but through a JAVA like
223           function name.
224
225       parse_file
226             $doc = $parser->parse_file( $xmlfilename );
227
228           This function parses an XML document from a file or network;
229           $xmlfilename can be either a filename or a (non-HTTPS) URL. Note
230           that for parsing files, this function is the fastest choice, about
231           6-8 times faster then parse_fh().
232
233       parse_fh
234             $doc = $parser->parse_fh( $io_fh );
235
236           parse_fh() parses a IOREF or a subclass of IO::Handle.
237
238           Because the data comes from an open handle, libxml2's parser does
239           not know about the base URI of the document. To set the base URI
240           one should use parse_fh() as follows:
241
242             my $doc = $parser->parse_fh( $io_fh, $baseuri );
243
244       parse_string
245             $doc = $parser->parse_string( $xmlstring);
246
247           This function is similar to parse_fh(), but it parses an XML
248           document that is available as a single string in memory, or
249           alternatively as a reference to a scalar containing a string.
250           Again, you can pass an optional base URI to the function.
251
252             my $doc = $parser->parse_string( $xmlstring, $baseuri );
253             my $doc = $parser->parse_string(\$xmlstring, $baseuri);
254
255       parse_html_file
256             $doc = $parser->parse_html_file( $htmlfile, \%opts );
257
258           Similar to parse_file() but parses HTML (strict) documents;
259           $htmlfile can be filename or (non-HTTPS) URL.
260
261           An optional second argument can be used to pass some options to the
262           HTML parser as a HASH reference. See options labeled with HTML in
263           "Parser Options".
264
265       parse_html_fh
266             $doc = $parser->parse_html_fh( $io_fh, \%opts );
267
268           Similar to parse_fh() but parses HTML (strict) streams.
269
270           An optional second argument can be used to pass some options to the
271           HTML parser as a HASH reference. See options labeled with HTML in
272           "Parser Options".
273
274           Note: encoding option may not work correctly with this function in
275           libxml2 < 2.6.27 if the HTML file declares charset using a META
276           tag.
277
278       parse_html_string
279             $doc = $parser->parse_html_string( $htmlstring, \%opts );
280
281           Similar to parse_string() but parses HTML (strict) strings.
282
283           An optional second argument can be used to pass some options to the
284           HTML parser as a HASH reference. See options labeled with HTML in
285           "Parser Options".
286
287   Push Parser
288       XML::LibXML provides a push parser interface. Rather than pulling the
289       data from a given source the push parser waits for the data to be
290       pushed into it.
291
292       This allows one to parse large documents without waiting for the parser
293       to finish. The interface is especially useful if a program needs to
294       pre-process the incoming pieces of XML (e.g. to detect document
295       boundaries).
296
297       While XML::LibXML parse_*() functions force the data to be a well-
298       formed XML, the push parser will take any arbitrary string that
299       contains some XML data. The only requirement is that all the pushed
300       strings are together a well formed document. With the push parser
301       interface a program can interrupt the parsing process as required,
302       where the parse_*() functions give not enough flexibility.
303
304       Different to the pull parser implemented in parse_fh() or parse_file(),
305       the push parser is not able to find out about the documents end itself.
306       Thus the calling program needs to indicate explicitly when the parsing
307       is done.
308
309       In XML::LibXML this is done by a single function:
310
311       parse_chunk
312             $parser->parse_chunk($string, $terminate);
313
314           parse_chunk() tries to parse a given chunk of data, which isn't
315           necessarily well balanced data. The function takes two parameters:
316           The chunk of data as a string and optional a termination flag. If
317           the termination flag is set to a true value (e.g. 1), the parsing
318           will be stopped and the resulting document will be returned as the
319           following example describes:
320
321             my $parser = XML::LibXML->new;
322             for my $string ( "<", "foo", ' bar="hello world"', "/>") {
323                  $parser->parse_chunk( $string );
324             }
325             my $doc = $parser->parse_chunk("", 1); # terminate the parsing
326
327       Internally XML::LibXML provides three functions that control the push
328       parser process:
329
330       init_push
331             $parser->init_push();
332
333           Initializes the push parser.
334
335       push
336             $parser->push(@data);
337
338           This function pushes the data stored inside the array to libxml2's
339           parser. Each entry in @data must be a normal scalar! This method
340           can be called repeatedly.
341
342       finish_push
343             $doc = $parser->finish_push( $recover );
344
345           This function returns the result of the parsing process. If this
346           function is called without a parameter it will complain about non
347           well-formed documents. If $restore is 1, the push parser can be
348           used to restore broken or non well formed (XML) documents as the
349           following example shows:
350
351             eval {
352                 $parser->push( "<foo>", "bar" );
353                 $doc = $parser->finish_push();    # will report broken XML
354             };
355             if ( $@ ) {
356                # ...
357             }
358
359           This can be annoying if the closing tag is missed by accident. The
360           following code will restore the document:
361
362             eval {
363                 $parser->push( "<foo>", "bar" );
364                 $doc = $parser->finish_push(1);   # will return the data parsed
365                                                   # unless an error happened
366             };
367
368             print $doc->toString(); # returns "<foo>bar</foo>"
369
370           Of course finish_push() will return nothing if there was no data
371           pushed to the parser before.
372
373   Pull Parser (Reader)
374       XML::LibXML also provides a pull-parser interface similar to the
375       XmlReader interface in .NET. This interface is almost streaming, and is
376       usually faster and simpler to use than SAX. See XML::LibXML::Reader.
377
378   Direct SAX Parser
379       XML::LibXML provides a direct SAX parser in the XML::LibXML::SAX
380       module.
381
382   DOM based SAX Parser
383       XML::LibXML also provides a DOM based SAX parser. The SAX parser is
384       defined in the module XML::LibXML::SAX::Parser. As it is not a stream
385       based parser, it parses documents into a DOM and traverses the DOM tree
386       instead.
387
388       The API of this parser is exactly the same as any other Perl SAX2
389       parser. See XML::SAX::Intro for details.
390
391       Aside from the regular parsing methods, you can access the DOM tree
392       traverser directly, using the generate() method:
393
394         my $doc = build_yourself_a_document();
395         my $saxparser = $XML::LibXML::SAX::Parser->new( ... );
396         $parser->generate( $doc );
397
398       This is useful for serializing DOM trees, for example that you might
399       have done prior processing on, or that you have as a result of XSLT
400       processing.
401
402       WARNING
403
404       This is NOT a streaming SAX parser. As I said above, this parser reads
405       the entire document into a DOM and serialises it. Some people couldn't
406       read that in the paragraph above so I've added this warning. If you
407       want a streaming SAX parser look at the XML::LibXML::SAX man page
408

SERIALIZATION

410       XML::LibXML provides some functions to serialize nodes and documents.
411       The serialization functions are described on the XML::LibXML::Node
412       manpage or the XML::LibXML::Document manpage. XML::LibXML checks three
413       global flags that alter the serialization process:
414
415       ·   skipXMLDeclaration
416
417       ·   skipDTD
418
419       ·   setTagCompression
420
421       of that three functions only setTagCompression is available for all
422       serialization functions.
423
424       Because XML::LibXML does these flags not itself, one has to define them
425       locally as the following example shows:
426
427         local $XML::LibXML::skipXMLDeclaration = 1;
428         local $XML::LibXML::skipDTD = 1;
429         local $XML::LibXML::setTagCompression = 1;
430
431       If skipXMLDeclaration is defined and not '0', the XML declaration is
432       omitted during serialization.
433
434       If skipDTD is defined and not '0', an existing DTD would not be
435       serialized with the document.
436
437       If setTagCompression is defined and not '0' empty tags are displayed as
438       open and closing tags rather than the shortcut. For example the empty
439       tag foo will be rendered as <foo></foo> rather than <foo/>.
440

PARSER OPTIONS

442       Handling of libxml2 parser options has been unified and improved in
443       XML::LibXML 1.70. You can now set default options for a particular
444       parser instance by passing them to the constructor as
445       "XML::LibXML->new({name=>value, ...})" or
446       "XML::LibXML->new(name=>value,...)". The options can be queried and
447       changed using the following methods (pre-1.70 interfaces such as
448       "$parser->load_ext_dtd(0)" also exist, see below):
449
450       option_exists
451             $parser->option_exists($name);
452
453           Returns 1 if the current XML::LibXML version supports the option
454           $name, otherwise returns 0 (note that this does not necessarily
455           mean that the option is supported by the underlying libxml2
456           library).
457
458       get_option
459             $parser->get_option($name);
460
461           Returns the current value of the parser option $name.
462
463       set_option
464             $parser->set_option($name,$value);
465
466           Sets option $name to value $value.
467
468       set_options
469             $parser->set_options({$name=>$value,...});
470
471           Sets multiple parsing options at once.
472
473       IMPORTANT NOTE: This documentation reflects the parser flags available
474       in libxml2 2.7.3. Some options have no effect if an older version of
475       libxml2 is used.
476
477       Each of the flags listed below is labeled
478
479       /parser/
480           if it can be used with a "XML::LibXML" parser object (i.e. passed
481           to "XML::LibXML->new", "XML::LibXML->set_option", etc.)
482
483       /html/
484           if it can be used passed to the "parse_html_*" methods
485
486       /reader/
487           if it can be used with the "XML::LibXML::Reader".
488
489       Unless specified otherwise, the default for boolean valued options is 0
490       (false).
491
492       The available options are:
493
494       URI /parser, html, reader/
495
496           In case of parsing strings or file handles, XML::LibXML doesn't
497           know about the base uri of the document. To make relative
498           references such as XIncludes work, one has to set a base URI, that
499           is then used for the parsed document.
500
501       line_numbers
502           /parser, html, reader/
503
504           If this option is activated, libxml2 will store the line number of
505           each element node in the parsed document. The line number can be
506           obtained using the "line_number()" method of the
507           "XML::LibXML::Node" class (for non-element nodes this may report
508           the line number of the containing element). The line numbers are
509           also used for reporting positions of validation errors.
510
511           IMPORTANT: Due to limitations in the libxml2 library line numbers
512           greater than 65535 will be returned as 65535. Unfortunately, this
513           is a long and sad story, please see
514           <http://bugzilla.gnome.org/show_bug.cgi?id=325533> for more
515           details.
516
517       encoding
518           /html/
519
520           character encoding of the input
521
522       recover
523           /parser, html, reader/
524
525           recover from errors; possible values are 0, 1, and 2
526
527           A true value turns on recovery mode which allows one to parse
528           broken XML or HTML data. The recovery mode allows the parser to
529           return the successfully parsed portion of the input document. This
530           is useful for almost well-formed documents, where for example a
531           closing tag is missing somewhere. Still, XML::LibXML will only
532           parse until the first fatal (non-recoverable) error occurs,
533           reporting recoverable parsing errors as warnings. To suppress even
534           these warnings, use recover=>2.
535
536           Note that validation is switched off automatically in recovery
537           mode.
538
539       expand_entities
540           /parser, reader/
541
542           substitute entities; possible values are 0 and 1; default is 1
543
544           Note that although this flag disables entity substitution, it does
545           not prevent the parser from loading external entities; when
546           substitution of an external entity is disabled, the entity will be
547           represented in the document tree by an XML_ENTITY_REF_NODE node
548           whose subtree will be the content obtained by parsing the external
549           resource; Although this nesting is visible from the DOM it is
550           transparent to XPath data model, so it is possible to match nodes
551           in an unexpanded entity by the same XPath expression as if the
552           entity were expanded.  See also ext_ent_handler.
553
554       ext_ent_handler
555           /parser/
556
557           Provide a custom external entity handler to be used when
558           expand_entities is set to 1. Possible value is a subroutine
559           reference.
560
561           This feature does not work properly in libxml2 < 2.6.27!
562
563           The subroutine provided is called whenever the parser needs to
564           retrieve the content of an external entity. It is called with two
565           arguments: the system ID (URI) and the public ID. The value
566           returned by the subroutine is parsed as the content of the entity.
567
568           This method can be used to completely disable entity loading, e.g.
569           to prevent exploits of the type described at
570           (<http://searchsecuritychannel.techtarget.com/generic/0,295582,sid97_gci1304703,00.html>),
571           where a service is tricked to expose its private data by letting it
572           parse a remote file (RSS feed) that contains an entity reference to
573           a local file (e.g. "/etc/fstab").
574
575           A more granular solution to this problem, however, is provided by
576           custom URL resolvers, as in
577
578             my $c = XML::LibXML::InputCallback->new();
579             sub match {   # accept file:/ URIs except for XML catalogs in /etc/xml/
580               my ($uri) = @_;
581               return ($uri=~m{^file:/}
582                       and $uri !~ m{^file:///etc/xml/})
583                      ? 1 : 0;
584             }
585             $c->register_callbacks([ \&match, sub{}, sub{}, sub{} ]);
586             $parser->input_callbacks($c);
587
588       load_ext_dtd
589           /parser, reader/
590
591           load the external DTD subset while parsing; possible values are 0
592           and 1. Unless specified, XML::LibXML sets this option to 1.
593
594           This flag is also required for DTD Validation, to provide complete
595           attribute, and to expand entities, regardless if the document has
596           an internal subset. Thus switching off external DTD loading, will
597           disable entity expansion, validation, and complete attributes on
598           internal subsets as well.
599
600       complete_attributes
601           /parser, reader/
602
603           create default DTD attributes; possible values are 0 and 1
604
605       validation
606           /parser, reader/
607
608           validate with the DTD; possible values are 0 and 1
609
610       suppress_errors
611           /parser, html, reader/
612
613           suppress error reports; possible values are 0 and 1
614
615       suppress_warnings
616           /parser, html, reader/
617
618           suppress warning reports; possible values are 0 and 1
619
620       pedantic_parser
621           /parser, html, reader/
622
623           pedantic error reporting; possible values are 0 and 1
624
625       no_blanks
626           /parser, html, reader/
627
628           remove blank nodes; possible values are 0 and 1
629
630       no_defdtd
631           /html/
632
633           do not add a default DOCTYPE; possible values are 0 and 1
634
635           the default is (0) to add a DTD when the input html lacks one
636
637       expand_xinclude or xinclude
638           /parser, reader/
639
640           Implement XInclude substitution; possible values are 0 and 1
641
642           Expands XInclude tags immediately while parsing the document. Note
643           that the parser will use the URI resolvers installed via
644           "XML::LibXML::InputCallback" to parse the included document (if
645           any).
646
647       no_xinclude_nodes
648           /parser, reader/
649
650           do not generate XINCLUDE START/END nodes; possible values are 0 and
651           1
652
653       no_network
654           /parser, html, reader/
655
656           Forbid network access; possible values are 0 and 1
657
658           If set to true, all attempts to fetch non-local resources (such as
659           DTD or external entities) will fail (unless custom callbacks are
660           defined).
661
662           It may be necessary to use the flag "recover" for processing
663           documents requiring such resources while networking is off.
664
665       clean_namespaces
666           /parser, reader/
667
668           remove redundant namespaces declarations during parsing; possible
669           values are 0 and 1.
670
671       no_cdata
672           /parser, html, reader/
673
674           merge CDATA as text nodes; possible values are 0 and 1
675
676       no_basefix
677           /parser, reader/
678
679           not fixup XINCLUDE xml#base URIS; possible values are 0 and 1
680
681       huge
682           /parser, html, reader/
683
684           relax any hardcoded limit from the parser; possible values are 0
685           and 1. Unless specified, XML::LibXML sets this option to 0.
686
687           Note: the default value for this option was changed to protect
688           against denial of service through entity expansion attacks. Before
689           enabling the option ensure you have taken alternative measures to
690           protect your application against this type of attack.
691
692       gdome
693           /parser/
694
695           THIS OPTION IS EXPERIMENTAL!
696
697           Although quite powerful, XML::LibXML's DOM implementation is
698           incomplete with respect to the DOM level 2 or level 3
699           specifications. XML::GDOME is based on libxml2 as well, and
700           provides a rather complete DOM implementation by wrapping libgdome.
701           This flag allows you to make use of XML::LibXML's full parser
702           options and XML::GDOME's DOM implementation at the same time.
703
704           To make use of this function, one has to install libgdome and
705           configure XML::LibXML to use this library. For this you need to
706           rebuild XML::LibXML!
707
708           Note: this feature was not seriously tested in recent XML::LibXML
709           releases.
710
711       For compatibility with XML::LibXML versions prior to 1.70, the
712       following methods are also supported for querying and setting the
713       corresponding parser options (if called without arguments, the methods
714       return the current value of the corresponding parser options; with an
715       argument sets the option to a given value):
716
717         $parser->validation();
718         $parser->recover();
719         $parser->pedantic_parser();
720         $parser->line_numbers();
721         $parser->load_ext_dtd();
722         $parser->complete_attributes();
723         $parser->expand_xinclude();
724         $parser->gdome_dom();
725         $parser->clean_namespaces();
726         $parser->no_network();
727
728       The following obsolete methods trigger parser options in some special
729       way:
730
731       recover_silently
732             $parser->recover_silently(1);
733
734           If called without an argument, returns true if the current value of
735           the "recover" parser option is 2 and returns false otherwise. With
736           a true argument sets the "recover" parser option to 2; with a false
737           argument sets the "recover" parser option to 0.
738
739       expand_entities
740             $parser->expand_entities(0);
741
742           Get/set the "expand_entities" option. If called with a true
743           argument, also turns the "load_ext_dtd" option to 1.
744
745       keep_blanks
746             $parser->keep_blanks(0);
747
748           This is actually the opposite of the "no_blanks" parser option. If
749           used without an argument retrieves negated value of "no_blanks". If
750           used with an argument sets "no_blanks" to the opposite value.
751
752       base_uri
753             $parser->base_uri( $your_base_uri );
754
755           Get/set the "URI" option.
756

XML CATALOGS

758       "libxml2" supports XML catalogs. Catalogs are used to map remote
759       resources to their local copies. Using catalogs can speed up parsing
760       processes if many external resources from remote addresses are loaded
761       into the parsed documents (such as DTDs or XIncludes).
762
763       Note that libxml2 has a global pool of loaded catalogs, so if you apply
764       the method "load_catalog" to one parser instance, all parser instances
765       will start using the catalog (in addition to other previously loaded
766       catalogs).
767
768       Note also that catalogs are not used when a custom external entity
769       handler is specified. At the current state it is not possible to make
770       use of both types of resolving systems at the same time.
771
772       load_catalog
773             $parser->load_catalog( $catalog_file );
774
775           Loads the XML catalog file $catalog_file.
776
777             # Global external entity loader (similar to ext_ent_handler option
778             # but this works really globally, also in XML::LibXSLT include etc..)
779
780             XML::LibXML::externalEntityLoader(\&my_loader);
781

ERROR REPORTING

783       XML::LibXML throws exceptions during parsing, validation or XPath
784       processing (and some other occasions). These errors can be caught by
785       using eval blocks. The error is stored in $@. There are two
786       implementations: the old one throws $@ which is just a message string,
787       in the new one $@ is an object from the class XML::LibXML::Error; this
788       class overrides the operator "" so that when printed, the object
789       flattens to the usual error message.
790
791       XML::LibXML throws errors as they occur. This is a very common
792       misunderstanding in the use of XML::LibXML. If the eval is omitted,
793       XML::LibXML will always halt your script by "croaking" (see Carp man
794       page for details).
795
796       Also note that an increasing number of functions throw errors if bad
797       data is passed as arguments. If you cannot assure valid data passed to
798       XML::LibXML you should eval these functions.
799
800       Note: since version 1.59, get_last_error() is no longer available in
801       XML::LibXML for thread-safety reasons.
802

AUTHORS

804       Matt Sergeant, Christian Glahn, Petr Pajas
805

VERSION

807       2.0205
808
810       2001-2007, AxKit.com Ltd.
811
812       2002-2006, Christian Glahn.
813
814       2006-2009, Petr Pajas.
815

LICENSE

817       This program is free software; you can redistribute it and/or modify it
818       under the same terms as Perl itself.
819
820
821
822perl v5.32.0                      2020-07-28            XML::LibXML::Parser(3)
Impressum