1XML::LibXML::Parser(3)User Contributed Perl DocumentationXML::LibXML::Parser(3)
2
3
4
6 XML::LibXML::Parser - Parsing XML Data with XML::LibXML
7
9 use XML::LibXML 1.70;
10
11 # Parser constructor
12
13 $parser = XML::LibXML->new();
14 $parser = XML::LibXML->new(option=>value, ...);
15 $parser = XML::LibXML->new({option=>value, ...});
16
17 # Parsing XML
18
19 $dom = XML::LibXML->load_xml(
20 location => $file_or_url
21 # parser options ...
22 );
23 $dom = XML::LibXML->load_xml(
24 string => $xml_string
25 # parser options ...
26 );
27 $dom = XML::LibXML->load_xml(
28 string => (\$xml_string)
29 # parser options ...
30 );
31 $dom = XML::LibXML->load_xml({
32 IO => $perl_file_handle
33 # parser options ...
34 );
35 $dom = $parser->load_xml(...);
36
37 # Parsing HTML
38
39 $dom = XML::LibXML->load_html(...);
40 $dom = $parser->load_html(...);
41
42 # Parsing well-balanced XML chunks
43
44 $fragment = $parser->parse_balanced_chunk( $wbxmlstring, $encoding );
45
46 # Processing XInclude
47
48 $parser->process_xincludes( $doc );
49 $parser->processXIncludes( $doc );
50
51 # Old-style parser interfaces
52
53 $doc = $parser->parse_file( $xmlfilename );
54 $doc = $parser->parse_fh( $io_fh );
55 $doc = $parser->parse_string( $xmlstring);
56 $doc = $parser->parse_html_file( $htmlfile, \%opts );
57 $doc = $parser->parse_html_fh( $io_fh, \%opts );
58 $doc = $parser->parse_html_string( $htmlstring, \%opts );
59
60 # Push parser
61
62 $parser->parse_chunk($string, $terminate);
63 $parser->init_push();
64 $parser->push(@data);
65 $doc = $parser->finish_push( $recover );
66
67 # Set/query parser options
68
69 $parser->option_exists($name);
70 $parser->get_option($name);
71 $parser->set_option($name,$value);
72 $parser->set_options({$name=>$value,...});
73
74 # XML catalogs
75
76 $parser->load_catalog( $catalog_file );
77
79 An XML document is read into a data structure such as a DOM tree by a
80 piece of software, called a parser. XML::LibXML currently provides four
81 different parser interfaces:
82
83 · A DOM Pull-Parser
84
85 · A DOM Push-Parser
86
87 · A SAX Parser
88
89 · A DOM based SAX Parser.
90
91 Creating a Parser Instance
92 XML::LibXML provides an OO interface to the libxml2 parser functions.
93 Thus you have to create a parser instance before you can parse any XML
94 data.
95
96 new
97 $parser = XML::LibXML->new();
98 $parser = XML::LibXML->new(option=>value, ...);
99 $parser = XML::LibXML->new({option=>value, ...});
100
101 Create a new XML and HTML parser instance. Each parser instance
102 holds default values for various parser options. Optionally, one
103 can pass a hash reference or a list of option => value pairs to set
104 a different default set of options. Unless specified otherwise,
105 the options "load_ext_dtd", and "expand_entities" are set to 1. See
106 "Parser Options" for a list of libxml2 parser's options.
107
108 DOM Parser
109 One of the common parser interfaces of XML::LibXML is the DOM parser.
110 This parser reads XML data into a DOM like data structure, so each tag
111 can get accessed and transformed.
112
113 XML::LibXML's DOM parser is not only capable to parse XML data, but
114 also (strict) HTML files. There are three ways to parse documents - as
115 a string, as a Perl filehandle, or as a filename/URL. The return value
116 from each is a XML::LibXML::Document object, which is a DOM object.
117
118 All of the functions listed below will throw an exception if the
119 document is invalid. To prevent this causing your program exiting, wrap
120 the call in an eval{} block
121
122 load_xml
123 $dom = XML::LibXML->load_xml(
124 location => $file_or_url
125 # parser options ...
126 );
127 $dom = XML::LibXML->load_xml(
128 string => $xml_string
129 # parser options ...
130 );
131 $dom = XML::LibXML->load_xml(
132 string => (\$xml_string)
133 # parser options ...
134 );
135 $dom = XML::LibXML->load_xml({
136 IO => $perl_file_handle
137 # parser options ...
138 );
139 $dom = $parser->load_xml(...);
140
141 This function is available since XML::LibXML 1.70. It provides easy
142 to use interface to the XML parser that parses given file (or URL),
143 string, or input stream to a DOM tree. The arguments can be passed
144 in a HASH reference or as name => value pairs. The function can be
145 called as a class method or an object method. In both cases it
146 internally creates a new parser instance passing the specified
147 parser options; if called as an object method, it clones the
148 original parser (preserving its settings) and additionally applies
149 the specified options to the new parser. See the constructor "new"
150 and "Parser Options" for more information.
151
152 load_html
153 $dom = XML::LibXML->load_html(...);
154 $dom = $parser->load_html(...);
155
156 This function is available since XML::LibXML 1.70. It has the same
157 usage as "load_xml", providing interface to the HTML parser. See
158 "load_xml" for more information.
159
160 Parsing HTML may cause problems, especially if the ampersand ('&') is
161 used. This is a common problem if HTML code is parsed that contains
162 links to CGI-scripts. Such links cause the parser to throw errors. In
163 such cases libxml2 still parses the entire document as there was no
164 error, but the error causes XML::LibXML to stop the parsing process.
165 However, the document is not lost. Such HTML documents should be
166 parsed using the recover flag. By default recovering is deactivated.
167
168 The functions described above are implemented to parse well formed
169 documents. In some cases a program gets well balanced XML instead of
170 well formed documents (e.g. an XML fragment from a database). With
171 XML::LibXML it is not required to wrap such fragments in the code,
172 because XML::LibXML is capable even to parse well balanced XML
173 fragments.
174
175 parse_balanced_chunk
176 $fragment = $parser->parse_balanced_chunk( $wbxmlstring, $encoding );
177
178 This function parses a well balanced XML string into a
179 XML::LibXML::DocumentFragment. The first arguments contains the
180 input string, the optional second argument can be used to specify
181 character encoding of the input (UTF-8 is assumed by default).
182
183 parse_xml_chunk
184 This is the old name of parse_balanced_chunk(). Because it may
185 causes confusion with the push parser interface, this function
186 should not be used anymore.
187
188 By default XML::LibXML does not process XInclude tags within an XML
189 Document (see options section below). XML::LibXML allows to post
190 process a document to expand XInclude tags.
191
192 process_xincludes
193 $parser->process_xincludes( $doc );
194
195 After a document is parsed into a DOM structure, you may want to
196 expand the documents XInclude tags. This function processes the
197 given document structure and expands all XInclude tags (or throws
198 an error) by using the flags and callbacks of the given parser
199 instance.
200
201 Note that the resulting Tree contains some extra nodes (of type
202 XML_XINCLUDE_START and XML_XINCLUDE_END) after successfully
203 processing the document. These nodes indicate where data was
204 included into the original tree. if the document is serialized,
205 these extra nodes will not show up.
206
207 Remember: A Document with processed XIncludes differs from the
208 original document after serialization, because the original
209 XInclude tags will not get restored!
210
211 If the parser flag "expand_xincludes" is set to 1, you need not to
212 post process the parsed document.
213
214 processXIncludes
215 $parser->processXIncludes( $doc );
216
217 This is an alias to process_xincludes, but through a JAVA like
218 function name.
219
220 parse_file
221 $doc = $parser->parse_file( $xmlfilename );
222
223 This function parses an XML document from a file or network;
224 $xmlfilename can be either a filename or an URL. Note that for
225 parsing files, this function is the fastest choice, about 6-8 times
226 faster then parse_fh().
227
228 parse_fh
229 $doc = $parser->parse_fh( $io_fh );
230
231 parse_fh() parses a IOREF or a subclass of IO::Handle.
232
233 Because the data comes from an open handle, libxml2's parser does
234 not know about the base URI of the document. To set the base URI
235 one should use parse_fh() as follows:
236
237 my $doc = $parser->parse_fh( $io_fh, $baseuri );
238
239 parse_string
240 $doc = $parser->parse_string( $xmlstring);
241
242 This function is similar to parse_fh(), but it parses an XML
243 document that is available as a single string in memory, or
244 alternatively as a reference to a scalar containing a string.
245 Again, you can pass an optional base URI to the function.
246
247 my $doc = $parser->parse_string( $xmlstring, $baseuri );
248 my $doc = $parser->parse_string(\$xmlstring, $baseuri);
249
250 parse_html_file
251 $doc = $parser->parse_html_file( $htmlfile, \%opts );
252
253 Similar to parse_file() but parses HTML (strict) documents;
254 $htmlfile can be filename or URL.
255
256 An optional second argument can be used to pass some options to the
257 HTML parser as a HASH reference. See options labeled with HTML in
258 "Parser Options".
259
260 parse_html_fh
261 $doc = $parser->parse_html_fh( $io_fh, \%opts );
262
263 Similar to parse_fh() but parses HTML (strict) streams.
264
265 An optional second argument can be used to pass some options to the
266 HTML parser as a HASH reference. See options labeled with HTML in
267 "Parser Options".
268
269 Note: encoding option may not work correctly with this function in
270 libxml2 < 2.6.27 if the HTML file declares charset using a META
271 tag.
272
273 parse_html_string
274 $doc = $parser->parse_html_string( $htmlstring, \%opts );
275
276 Similar to parse_string() but parses HTML (strict) strings.
277
278 An optional second argument can be used to pass some options to the
279 HTML parser as a HASH reference. See options labeled with HTML in
280 "Parser Options".
281
282 Push Parser
283 XML::LibXML provides a push parser interface. Rather than pulling the
284 data from a given source the push parser waits for the data to be
285 pushed into it.
286
287 This allows one to parse large documents without waiting for the parser
288 to finish. The interface is especially useful if a program needs to
289 pre-process the incoming pieces of XML (e.g. to detect document
290 boundaries).
291
292 While XML::LibXML parse_*() functions force the data to be a well-
293 formed XML, the push parser will take any arbitrary string that
294 contains some XML data. The only requirement is that all the pushed
295 strings are together a well formed document. With the push parser
296 interface a program can interrupt the parsing process as required,
297 where the parse_*() functions give not enough flexibility.
298
299 Different to the pull parser implemented in parse_fh() or parse_file(),
300 the push parser is not able to find out about the documents end itself.
301 Thus the calling program needs to indicate explicitly when the parsing
302 is done.
303
304 In XML::LibXML this is done by a single function:
305
306 parse_chunk
307 $parser->parse_chunk($string, $terminate);
308
309 parse_chunk() tries to parse a given chunk of data, which isn't
310 necessarily well balanced data. The function takes two parameters:
311 The chunk of data as a string and optional a termination flag. If
312 the termination flag is set to a true value (e.g. 1), the parsing
313 will be stopped and the resulting document will be returned as the
314 following example describes:
315
316 my $parser = XML::LibXML->new;
317 for my $string ( "<", "foo", ' bar="hello world"', "/>") {
318 $parser->parse_chunk( $string );
319 }
320 my $doc = $parser->parse_chunk("", 1); # terminate the parsing
321
322 Internally XML::LibXML provides three functions that control the push
323 parser process:
324
325 init_push
326 $parser->init_push();
327
328 Initializes the push parser.
329
330 push
331 $parser->push(@data);
332
333 This function pushes the data stored inside the array to libxml2's
334 parser. Each entry in @data must be a normal scalar! This method
335 can be called repeatedly.
336
337 finish_push
338 $doc = $parser->finish_push( $recover );
339
340 This function returns the result of the parsing process. If this
341 function is called without a parameter it will complain about non
342 well-formed documents. If $restore is 1, the push parser can be
343 used to restore broken or non well formed (XML) documents as the
344 following example shows:
345
346 eval {
347 $parser->push( "<foo>", "bar" );
348 $doc = $parser->finish_push(); # will report broken XML
349 };
350 if ( $@ ) {
351 # ...
352 }
353
354 This can be annoying if the closing tag is missed by accident. The
355 following code will restore the document:
356
357 eval {
358 $parser->push( "<foo>", "bar" );
359 $doc = $parser->finish_push(1); # will return the data parsed
360 # unless an error happened
361 };
362
363 print $doc->toString(); # returns "<foo>bar</foo>"
364
365 Of course finish_push() will return nothing if there was no data
366 pushed to the parser before.
367
368 Pull Parser (Reader)
369 XML::LibXML also provides a pull-parser interface similar to the
370 XmlReader interface in .NET. This interface is almost streaming, and is
371 usually faster and simpler to use than SAX. See XML::LibXML::Reader.
372
373 Direct SAX Parser
374 XML::LibXML provides a direct SAX parser in the XML::LibXML::SAX
375 module.
376
377 DOM based SAX Parser
378 XML::LibXML also provides a DOM based SAX parser. The SAX parser is
379 defined in the module XML::LibXML::SAX::Parser. As it is not a stream
380 based parser, it parses documents into a DOM and traverses the DOM tree
381 instead.
382
383 The API of this parser is exactly the same as any other Perl SAX2
384 parser. See XML::SAX::Intro for details.
385
386 Aside from the regular parsing methods, you can access the DOM tree
387 traverser directly, using the generate() method:
388
389 my $doc = build_yourself_a_document();
390 my $saxparser = $XML::LibXML::SAX::Parser->new( ... );
391 $parser->generate( $doc );
392
393 This is useful for serializing DOM trees, for example that you might
394 have done prior processing on, or that you have as a result of XSLT
395 processing.
396
397 WARNING
398
399 This is NOT a streaming SAX parser. As I said above, this parser reads
400 the entire document into a DOM and serialises it. Some people couldn't
401 read that in the paragraph above so I've added this warning. If you
402 want a streaming SAX parser look at the XML::LibXML::SAX man page
403
405 XML::LibXML provides some functions to serialize nodes and documents.
406 The serialization functions are described on the XML::LibXML::Node
407 manpage or the XML::LibXML::Document manpage. XML::LibXML checks three
408 global flags that alter the serialization process:
409
410 · skipXMLDeclaration
411
412 · skipDTD
413
414 · setTagCompression
415
416 of that three functions only setTagCompression is available for all
417 serialization functions.
418
419 Because XML::LibXML does these flags not itself, one has to define them
420 locally as the following example shows:
421
422 local $XML::LibXML::skipXMLDeclaration = 1;
423 local $XML::LibXML::skipDTD = 1;
424 local $XML::LibXML::setTagCompression = 1;
425
426 If skipXMLDeclaration is defined and not '0', the XML declaration is
427 omitted during serialization.
428
429 If skipDTD is defined and not '0', an existing DTD would not be
430 serialized with the document.
431
432 If setTagCompression is defined and not '0' empty tags are displayed as
433 open and closing tags rather than the shortcut. For example the empty
434 tag foo will be rendered as <foo></foo> rather than <foo/>.
435
437 Handling of libxml2 parser options has been unified and improved in
438 XML::LibXML 1.70. You can now set default options for a particular
439 parser instance by passing them to the constructor as
440 "XML::LibXML->new({name=>value, ...})" or
441 "XML::LibXML->new(name=>value,...)". The options can be queried and
442 changed using the following methods (pre-1.70 interfaces such as
443 "$parser->load_ext_dtd(0)" also exist, see below):
444
445 option_exists
446 $parser->option_exists($name);
447
448 Returns 1 if the current XML::LibXML version supports the option
449 $name, otherwise returns 0 (note that this does not necessarily
450 mean that the option is supported by the underlying libxml2
451 library).
452
453 get_option
454 $parser->get_option($name);
455
456 Returns the current value of the parser option $name.
457
458 set_option
459 $parser->set_option($name,$value);
460
461 Sets option $name to value $value.
462
463 set_options
464 $parser->set_options({$name=>$value,...});
465
466 Sets multiple parsing options at once.
467
468 IMPORTANT NOTE: This documentation reflects the parser flags available
469 in libxml2 2.7.3. Some options have no effect if an older version of
470 libxml2 is used.
471
472 Each of the flags listed below is labeled
473
474 /parser/
475 if it can be used with a "XML::LibXML" parser object (i.e. passed
476 to "XML::LibXML->new", "XML::LibXML->set_option", etc.)
477
478 /html/
479 if it can be used passed to the "parse_html_*" methods
480
481 /reader/
482 if it can be used with the "XML::LibXML::Reader".
483
484 Unless specified otherwise, the default for boolean valued options is 0
485 (false).
486
487 The available options are:
488
489 URI /parser, html, reader/
490
491 In case of parsing strings or file handles, XML::LibXML doesn't
492 know about the base uri of the document. To make relative
493 references such as XIncludes work, one has to set a base URI, that
494 is then used for the parsed document.
495
496 line_numbers
497 /parser, html, reader/
498
499 If this option is activated, libxml2 will store the line number of
500 each element node in the parsed document. The line number can be
501 obtained using the "line_number()" method of the
502 "XML::LibXML::Node" class (for non-element nodes this may report
503 the line number of the containing element). The line numbers are
504 also used for reporting positions of validation errors.
505
506 IMPORTANT: Due to limitations in the libxml2 library line numbers
507 greater than 65535 will be returned as 65535. Unfortunately, this
508 is a long and sad story, please see
509 <http://bugzilla.gnome.org/show_bug.cgi?id=325533> for more
510 details.
511
512 encoding
513 /html/
514
515 character encoding of the input
516
517 recover
518 /parser, html, reader/
519
520 recover from errors; possible values are 0, 1, and 2
521
522 A true value turns on recovery mode which allows one to parse
523 broken XML or HTML data. The recovery mode allows the parser to
524 return the successfully parsed portion of the input document. This
525 is useful for almost well-formed documents, where for example a
526 closing tag is missing somewhere. Still, XML::LibXML will only
527 parse until the first fatal (non-recoverable) error occurs,
528 reporting recoverable parsing errors as warnings. To suppress even
529 these warnings, use recover=>2.
530
531 Note that validation is switched off automatically in recovery
532 mode.
533
534 expand_entities
535 /parser, reader/
536
537 substitute entities; possible values are 0 and 1; default is 1
538
539 Note that although this flag disables entity substitution, it does
540 not prevent the parser from loading external entities; when
541 substitution of an external entity is disabled, the entity will be
542 represented in the document tree by an XML_ENTITY_REF_NODE node
543 whose subtree will be the content obtained by parsing the external
544 resource; Although this nesting is visible from the DOM it is
545 transparent to XPath data model, so it is possible to match nodes
546 in an unexpanded entity by the same XPath expression as if the
547 entity were expanded. See also ext_ent_handler.
548
549 ext_ent_handler
550 /parser/
551
552 Provide a custom external entity handler to be used when
553 expand_entities is set to 1. Possible value is a subroutine
554 reference.
555
556 This feature does not work properly in libxml2 < 2.6.27!
557
558 The subroutine provided is called whenever the parser needs to
559 retrieve the content of an external entity. It is called with two
560 arguments: the system ID (URI) and the public ID. The value
561 returned by the subroutine is parsed as the content of the entity.
562
563 This method can be used to completely disable entity loading, e.g.
564 to prevent exploits of the type described at
565 (<http://searchsecuritychannel.techtarget.com/generic/0,295582,sid97_gci1304703,00.html>),
566 where a service is tricked to expose its private data by letting it
567 parse a remote file (RSS feed) that contains an entity reference to
568 a local file (e.g. "/etc/fstab").
569
570 A more granular solution to this problem, however, is provided by
571 custom URL resolvers, as in
572
573 my $c = XML::LibXML::InputCallback->new();
574 sub match { # accept file:/ URIs except for XML catalogs in /etc/xml/
575 my ($uri) = @_;
576 return ($uri=~m{^file:/}
577 and $uri !~ m{^file:///etc/xml/})
578 ? 1 : 0;
579 }
580 $c->register_callbacks([ \&match, sub{}, sub{}, sub{} ]);
581 $parser->input_callbacks($c);
582
583 load_ext_dtd
584 /parser, reader/
585
586 load the external DTD subset while parsing; possible values are 0
587 and 1. Unless specified, XML::LibXML sets this option to 1.
588
589 This flag is also required for DTD Validation, to provide complete
590 attribute, and to expand entities, regardless if the document has
591 an internal subset. Thus switching off external DTD loading, will
592 disable entity expansion, validation, and complete attributes on
593 internal subsets as well.
594
595 complete_attributes
596 /parser, reader/
597
598 create default DTD attributes; possible values are 0 and 1
599
600 validation
601 /parser, reader/
602
603 validate with the DTD; possible values are 0 and 1
604
605 suppress_errors
606 /parser, html, reader/
607
608 suppress error reports; possible values are 0 and 1
609
610 suppress_warnings
611 /parser, html, reader/
612
613 suppress warning reports; possible values are 0 and 1
614
615 pedantic_parser
616 /parser, html, reader/
617
618 pedantic error reporting; possible values are 0 and 1
619
620 no_blanks
621 /parser, html, reader/
622
623 remove blank nodes; possible values are 0 and 1
624
625 no_defdtd
626 /html/
627
628 do not add a default DOCTYPE; possible values are 0 and 1
629
630 the default is (0) to add a DTD when the input html lacks one
631
632 expand_xinclude or xinclude
633 /parser, reader/
634
635 Implement XInclude substitution; possible values are 0 and 1
636
637 Expands XInclude tags immediately while parsing the document. Note
638 that the parser will use the URI resolvers installed via
639 "XML::LibXML::InputCallback" to parse the included document (if
640 any).
641
642 no_xinclude_nodes
643 /parser, reader/
644
645 do not generate XINCLUDE START/END nodes; possible values are 0 and
646 1
647
648 no_network
649 /parser, html, reader/
650
651 Forbid network access; possible values are 0 and 1
652
653 If set to true, all attempts to fetch non-local resources (such as
654 DTD or external entities) will fail (unless custom callbacks are
655 defined).
656
657 It may be necessary to use the flag "recover" for processing
658 documents requiring such resources while networking is off.
659
660 clean_namespaces
661 /parser, reader/
662
663 remove redundant namespaces declarations during parsing; possible
664 values are 0 and 1.
665
666 no_cdata
667 /parser, html, reader/
668
669 merge CDATA as text nodes; possible values are 0 and 1
670
671 no_basefix
672 /parser, reader/
673
674 not fixup XINCLUDE xml#base URIS; possible values are 0 and 1
675
676 huge
677 /parser, html, reader/
678
679 relax any hardcoded limit from the parser; possible values are 0
680 and 1. Unless specified, XML::LibXML sets this option to 0.
681
682 Note: the default value for this option was changed to protect
683 against denial of service through entity expansion attacks. Before
684 enabling the option ensure you have taken alternative measures to
685 protect your application against this type of attack.
686
687 gdome
688 /parser/
689
690 THIS OPTION IS EXPERIMENTAL!
691
692 Although quite powerful, XML::LibXML's DOM implementation is
693 incomplete with respect to the DOM level 2 or level 3
694 specifications. XML::GDOME is based on libxml2 as well and and
695 provides a rather complete DOM implementation by wrapping libgdome.
696 This flag allows you to make use of XML::LibXML's full parser
697 options and XML::GDOME's DOM implementation at the same time.
698
699 To make use of this function, one has to install libgdome and
700 configure XML::LibXML to use this library. For this you need to
701 rebuild XML::LibXML!
702
703 Note: this feature was not seriously tested in recent XML::LibXML
704 releases.
705
706 For compatibility with XML::LibXML versions prior to 1.70, the
707 following methods are also supported for querying and setting the
708 corresponding parser options (if called without arguments, the methods
709 return the current value of the corresponding parser options; with an
710 argument sets the option to a given value):
711
712 $parser->validation();
713 $parser->recover();
714 $parser->pedantic_parser();
715 $parser->line_numbers();
716 $parser->load_ext_dtd();
717 $parser->complete_attributes();
718 $parser->expand_xinclude();
719 $parser->gdome_dom();
720 $parser->clean_namespaces();
721 $parser->no_network();
722
723 The following obsolete methods trigger parser options in some special
724 way:
725
726 recover_silently
727 $parser->recover_silently(1);
728
729 If called without an argument, returns true if the current value of
730 the "recover" parser option is 2 and returns false otherwise. With
731 a true argument sets the "recover" parser option to 2; with a false
732 argument sets the "recover" parser option to 0.
733
734 expand_entities
735 $parser->expand_entities(0);
736
737 Get/set the "expand_entities" option. If called with a true
738 argument, also turns the "load_ext_dtd" option to 1.
739
740 keep_blanks
741 $parser->keep_blanks(0);
742
743 This is actually the opposite of the "no_blanks" parser option. If
744 used without an argument retrieves negated value of "no_blanks". If
745 used with an argument sets "no_blanks" to the opposite value.
746
747 base_uri
748 $parser->base_uri( $your_base_uri );
749
750 Get/set the "URI" option.
751
753 "libxml2" supports XML catalogs. Catalogs are used to map remote
754 resources to their local copies. Using catalogs can speed up parsing
755 processes if many external resources from remote addresses are loaded
756 into the parsed documents (such as DTDs or XIncludes).
757
758 Note that libxml2 has a global pool of loaded catalogs, so if you apply
759 the method "load_catalog" to one parser instance, all parser instances
760 will start using the catalog (in addition to other previously loaded
761 catalogs).
762
763 Note also that catalogs are not used when a custom external entity
764 handler is specified. At the current state it is not possible to make
765 use of both types of resolving systems at the same time.
766
767 load_catalog
768 $parser->load_catalog( $catalog_file );
769
770 Loads the XML catalog file $catalog_file.
771
772 # Global external entity loader (similar to ext_ent_handler option
773 # but this works really globally, also in XML::LibXSLT include etc..)
774
775 XML::LibXML::externalEntityLoader(\&my_loader);
776
778 XML::LibXML throws exceptions during parsing, validation or XPath
779 processing (and some other occasions). These errors can be caught by
780 using eval blocks. The error is stored in $@. There are two
781 implementations: the old one throws $@ which is just a message string,
782 in the new one $@ is an object from the class XML::LibXML::Error; this
783 class overrides the operator "" so that when printed, the object
784 flattens to the usual error message.
785
786 XML::LibXML throws errors as they occur. This is a very common
787 misunderstanding in the use of XML::LibXML. If the eval is omitted,
788 XML::LibXML will always halt your script by "croaking" (see Carp man
789 page for details).
790
791 Also note that an increasing number of functions throw errors if bad
792 data is passed as arguments. If you cannot assure valid data passed to
793 XML::LibXML you should eval these functions.
794
795 Note: since version 1.59, get_last_error() is no longer available in
796 XML::LibXML for thread-safety reasons.
797
799 Matt Sergeant, Christian Glahn, Petr Pajas
800
802 2.0018
803
805 2001-2007, AxKit.com Ltd.
806
807 2002-2006, Christian Glahn.
808
809 2006-2009, Petr Pajas.
810
811
812
813perl v5.16.3 2013-05-13 XML::LibXML::Parser(3)