1Expat(3)              User Contributed Perl Documentation             Expat(3)
2
3
4

NAME

6       XML::Parser::Expat - Lowlevel access to James Clark's expat XML parser
7

SYNOPSIS

9        use XML::Parser::Expat;
10
11        $parser = XML::Parser::Expat->new;
12        $parser->setHandlers('Start' => \&sh,
13                             'End'   => \&eh,
14                             'Char'  => \&ch);
15        open(my $fh, '<', 'info.xml') or die "Couldn't open";
16        $parser->parse($fh);
17        close($fh);
18        # $parser->parse('<foo id="me"> here <em>we</em> go </foo>');
19
20        sub sh
21        {
22          my ($p, $el, %atts) = @_;
23          $p->setHandlers('Char' => \&spec)
24            if ($el eq 'special');
25          ...
26        }
27
28        sub eh
29        {
30          my ($p, $el) = @_;
31          $p->setHandlers('Char' => \&ch)  # Special elements won't contain
32            if ($el eq 'special');         # other special elements
33          ...
34        }
35

DESCRIPTION

37       This module provides an interface to James Clark's XML parser, expat.
38       As in expat, a single instance of the parser can only parse one
39       document. Calls to parsestring after the first for a given instance
40       will die.
41
42       Expat (and XML::Parser::Expat) are event based. As the parser
43       recognizes parts of the document (say the start or end of an XML
44       element), then any handlers registered for that type of an event are
45       called with suitable parameters.
46

METHODS

48       new This is a class method, the constructor for XML::Parser::Expat.
49           Options are passed as keyword value pairs. The recognized options
50           are:
51
52           ·   ProtocolEncoding
53
54               The protocol encoding name. The default is none. The expat
55               built-in encodings are: "UTF-8", "ISO-8859-1", "UTF-16", and
56               "US-ASCII".  Other encodings may be used if they have encoding
57               maps in one of the directories in the @Encoding_Path list.
58               Setting the protocol encoding overrides any encoding in the XML
59               declaration.
60
61           ·   Namespaces
62
63               When this option is given with a true value, then the parser
64               does namespace processing. By default, namespace processing is
65               turned off. When it is turned on, the parser consumes xmlns
66               attributes and strips off prefixes from element and attributes
67               names where those prefixes have a defined namespace. A name's
68               namespace can be found using the "namespace" method and two
69               names can be checked for absolute equality with the "eq_name"
70               method.
71
72           ·   NoExpand
73
74               Normally, the parser will try to expand references to entities
75               defined in the internal subset. If this option is set to a true
76               value, and a default handler is also set, then the default
77               handler will be called when an entity reference is seen in
78               text. This has no effect if a default handler has not been
79               registered, and it has no effect on the expansion of entity
80               references inside attribute values.
81
82           ·   Stream_Delimiter
83
84               This option takes a string value. When this string is found
85               alone on a line while parsing from a stream, then the parse is
86               ended as if it saw an end of file. The intended use is with a
87               stream of xml documents in a MIME multipart format. The string
88               should not contain a trailing newline.
89
90           ·   ErrorContext
91
92               When this option is defined, errors are reported in context.
93               The value of ErrorContext should be the number of lines to show
94               on either side of the line in which the error occurred.
95
96           ·   ParseParamEnt
97
98               Unless standalone is set to "yes" in the XML declaration,
99               setting this to a true value allows the external DTD to be
100               read, and parameter entities to be parsed and expanded.
101
102           ·   Base
103
104               The base to use for relative pathnames or URLs. This can also
105               be done by using the base method.
106
107       setHandlers(TYPE, HANDLER [, TYPE, HANDLER [...]])
108           This method registers handlers for the various events. If no
109           handlers are registered, then a call to parsestring or parsefile
110           will only determine if the corresponding XML document is well
111           formed (by returning without error.)  This may be called from
112           within a handler, after the parse has started.
113
114           Setting a handler to something that evaluates to false unsets that
115           handler.
116
117           This method returns a list of type, handler pairs corresponding to
118           the input. The handlers returned are the ones that were in effect
119           before the call to setHandlers.
120
121           The recognized events and the parameters passed to the
122           corresponding handlers are:
123
124           ·   Start             (Parser, Element [, Attr, Val [,...]])
125
126               This event is generated when an XML start tag is recognized.
127               Parser is an XML::Parser::Expat instance. Element is the name
128               of the XML element that is opened with the start tag. The Attr
129               & Val pairs are generated for each attribute in the start tag.
130
131           ·   End               (Parser, Element)
132
133               This event is generated when an XML end tag is recognized. Note
134               that an XML empty tag (<foo/>) generates both a start and an
135               end event.
136
137               There is always a lower level start and end handler installed
138               that wrap the corresponding callbacks. This is to handle the
139               context mechanism.  A consequence of this is that the default
140               handler (see below) will not see a start tag or end tag unless
141               the default_current method is called.
142
143           ·   Char              (Parser, String)
144
145               This event is generated when non-markup is recognized. The non-
146               markup sequence of characters is in String. A single non-markup
147               sequence of characters may generate multiple calls to this
148               handler. Whatever the encoding of the string in the original
149               document, this is given to the handler in UTF-8.
150
151           ·   Proc              (Parser, Target, Data)
152
153               This event is generated when a processing instruction is
154               recognized.
155
156           ·   Comment           (Parser, String)
157
158               This event is generated when a comment is recognized.
159
160           ·   CdataStart        (Parser)
161
162               This is called at the start of a CDATA section.
163
164           ·   CdataEnd          (Parser)
165
166               This is called at the end of a CDATA section.
167
168           ·   Default           (Parser, String)
169
170               This is called for any characters that don't have a registered
171               handler.  This includes both characters that are part of markup
172               for which no events are generated (markup declarations) and
173               characters that could generate events, but for which no handler
174               has been registered.
175
176               Whatever the encoding in the original document, the string is
177               returned to the handler in UTF-8.
178
179           ·   Unparsed          (Parser, Entity, Base, Sysid, Pubid,
180               Notation)
181
182               This is called for a declaration of an unparsed entity. Entity
183               is the name of the entity. Base is the base to be used for
184               resolving a relative URI.  Sysid is the system id. Pubid is the
185               public id. Notation is the notation name. Base and Pubid may be
186               undefined.
187
188           ·   Notation          (Parser, Notation, Base, Sysid, Pubid)
189
190               This is called for a declaration of notation. Notation is the
191               notation name.  Base is the base to be used for resolving a
192               relative URI. Sysid is the system id. Pubid is the public id.
193               Base, Sysid, and Pubid may all be undefined.
194
195           ·   ExternEnt         (Parser, Base, Sysid, Pubid)
196
197               This is called when an external entity is referenced. Base is
198               the base to be used for resolving a relative URI. Sysid is the
199               system id. Pubid is the public id. Base, and Pubid may be
200               undefined.
201
202               This handler should either return a string, which represents
203               the contents of the external entity, or return an open
204               filehandle that can be read to obtain the contents of the
205               external entity, or return undef, which indicates the external
206               entity couldn't be found and will generate a parse error.
207
208               If an open filehandle is returned, it must be returned as
209               either a glob (*FOO) or as a reference to a glob (e.g. an
210               instance of IO::Handle).
211
212           ·   ExternEntFin      (Parser)
213
214               This is called after an external entity has been parsed. It
215               allows applications to perform cleanup on actions performed in
216               the above ExternEnt handler.
217
218           ·   Entity            (Parser, Name, Val, Sysid, Pubid, Ndata,
219               IsParam)
220
221               This is called when an entity is declared. For internal
222               entities, the Val parameter will contain the value and the
223               remaining three parameters will be undefined. For external
224               entities, the Val parameter will be undefined, the Sysid
225               parameter will have the system id, the Pubid parameter will
226               have the public id if it was provided (it will be undefined
227               otherwise), the Ndata parameter will contain the notation for
228               unparsed entities. If this is a parameter entity declaration,
229               then the IsParam parameter is true.
230
231               Note that this handler and the Unparsed handler above overlap.
232               If both are set, then this handler will not be called for
233               unparsed entities.
234
235           ·   Element           (Parser, Name, Model)
236
237               The element handler is called when an element declaration is
238               found. Name is the element name, and Model is the content model
239               as an XML::Parser::ContentModel object. See
240               "XML::Parser::ContentModel Methods" for methods available for
241               this class.
242
243           ·   Attlist           (Parser, Elname, Attname, Type, Default,
244               Fixed)
245
246               This handler is called for each attribute in an ATTLIST
247               declaration.  So an ATTLIST declaration that has multiple
248               attributes will generate multiple calls to this handler. The
249               Elname parameter is the name of the element with which the
250               attribute is being associated. The Attname parameter is the
251               name of the attribute. Type is the attribute type, given as a
252               string. Default is the default value, which will either be
253               "#REQUIRED", "#IMPLIED" or a quoted string (i.e. the returned
254               string will begin and end with a quote character). If Fixed is
255               true, then this is a fixed attribute.
256
257           ·   Doctype           (Parser, Name, Sysid, Pubid, Internal)
258
259               This handler is called for DOCTYPE declarations. Name is the
260               document type name. Sysid is the system id of the document
261               type, if it was provided, otherwise it's undefined. Pubid is
262               the public id of the document type, which will be undefined if
263               no public id was given. Internal will be true or false,
264               indicating whether or not the doctype declaration contains an
265               internal subset.
266
267           ·   DoctypeFin        (Parser)
268
269               This handler is called after parsing of the DOCTYPE declaration
270               has finished, including any internal or external DTD
271               declarations.
272
273           ·   XMLDecl           (Parser, Version, Encoding, Standalone)
274
275               This handler is called for XML declarations. Version is a
276               string containing the version. Encoding is either undefined or
277               contains an encoding string.  Standalone is either undefined,
278               or true or false. Undefined indicates that no standalone
279               parameter was given in the XML declaration. True or false
280               indicates "yes" or "no" respectively.
281
282       namespace(name)
283           Return the URI of the namespace that the name belongs to. If the
284           name doesn't belong to any namespace, an undef is returned. This is
285           only valid on names received through the Start or End handlers from
286           a single document, or through a call to the generate_ns_name
287           method. In other words, don't use names generated from one instance
288           of XML::Parser::Expat with other instances.
289
290       eq_name(name1, name2)
291           Return true if name1 and name2 are identical (i.e. same name and
292           from the same namespace.) This is only meaningful if both names
293           were obtained through the Start or End handlers from a single
294           document, or through a call to the generate_ns_name method.
295
296       generate_ns_name(name, namespace)
297           Return a name, associated with a given namespace, good for using
298           with the above 2 methods. The namespace argument should be the
299           namespace URI, not a prefix.
300
301       new_ns_prefixes
302           When called from a start tag handler, returns namespace prefixes
303           declared with this start tag. If called elsewhere (or if there were
304           no namespace prefixes declared), it returns an empty list. Setting
305           of the default namespace is indicated with '#default' as a prefix.
306
307       expand_ns_prefix(prefix)
308           Return the uri to which the given prefix is currently bound.
309           Returns undef if the prefix isn't currently bound. Use '#default'
310           to find the current binding of the default namespace (if any).
311
312       current_ns_prefixes
313           Return a list of currently bound namespace prefixes. The order of
314           the the prefixes in the list has no meaning. If the default
315           namespace is currently bound, '#default' appears in the list.
316
317       recognized_string
318           Returns the string from the document that was recognized in order
319           to call the current handler. For instance, when called from a start
320           handler, it will give us the start-tag string. The string is
321           encoded in UTF-8.  This method doesn't return a meaningful string
322           inside declaration handlers.
323
324       original_string
325           Returns the verbatim string from the document that was recognized
326           in order to call the current handler. The string is in the original
327           document encoding. This method doesn't return a meaningful string
328           inside declaration handlers.
329
330       default_current
331           When called from a handler, causes the sequence of characters that
332           generated the corresponding event to be sent to the default handler
333           (if one is registered). Use of this method is deprecated in favor
334           the recognized_string method, which you can use without installing
335           a default handler. This method doesn't deliver a meaningful string
336           to the default handler when called from inside declaration
337           handlers.
338
339       xpcroak(message)
340           Concatenate onto the given message the current line number within
341           the XML document plus the message implied by ErrorContext. Then
342           croak with the formed message.
343
344       xpcarp(message)
345           Concatenate onto the given message the current line number within
346           the XML document plus the message implied by ErrorContext. Then
347           carp with the formed message.
348
349       current_line
350           Returns the line number of the current position of the parse.
351
352       current_column
353           Returns the column number of the current position of the parse.
354
355       current_byte
356           Returns the current position of the parse.
357
358       base([NEWBASE]);
359           Returns the current value of the base for resolving relative URIs.
360           If NEWBASE is supplied, changes the base to that value.
361
362       context
363           Returns a list of element names that represent open elements, with
364           the last one being the innermost. Inside start and end tag
365           handlers, this will be the tag of the parent element.
366
367       current_element
368           Returns the name of the innermost currently opened element. Inside
369           start or end handlers, returns the parent of the element associated
370           with those tags.
371
372       in_element(NAME)
373           Returns true if NAME is equal to the name of the innermost
374           currently opened element. If namespace processing is being used and
375           you want to check against a name that may be in a namespace, then
376           use the generate_ns_name method to create the NAME argument.
377
378       within_element(NAME)
379           Returns the number of times the given name appears in the context
380           list.  If namespace processing is being used and you want to check
381           against a name that may be in a namespace, then use the
382           generate_ns_name method to create the NAME argument.
383
384       depth
385           Returns the size of the context list.
386
387       element_index
388           Returns an integer that is the depth-first visit order of the
389           current element. This will be zero outside of the root element. For
390           example, this will return 1 when called from the start handler for
391           the root element start tag.
392
393       skip_until(INDEX)
394           INDEX is an integer that represents an element index. When this
395           method is called, all handlers are suspended until the start tag
396           for an element that has an index number equal to INDEX is seen. If
397           a start handler has been set, then this is the first tag that the
398           start handler will see after skip_until has been called.
399
400       position_in_context(LINES)
401           Returns a string that shows the current parse position. LINES
402           should be an integer >= 0 that represents the number of lines on
403           either side of the current parse line to place into the returned
404           string.
405
406       xml_escape(TEXT [, CHAR [, CHAR ...]])
407           Returns TEXT with markup characters turned into character entities.
408           Any additional characters provided as arguments are also turned
409           into character references where found in TEXT.
410
411       parse (SOURCE)
412           The SOURCE parameter should either be a string containing the whole
413           XML document, or it should be an open IO::Handle. Only a single
414           document may be parsed for a given instance of XML::Parser::Expat,
415           so this will croak if it's been called previously for this
416           instance.
417
418       parsestring(XML_DOC_STRING)
419           Parses the given string as an XML document. Only a single document
420           may be parsed for a given instance of XML::Parser::Expat, so this
421           will die if either parsestring or parsefile has been called for
422           this instance previously.
423
424           This method is deprecated in favor of the parse method.
425
426       parsefile(FILENAME)
427           Parses the XML document in the given file. Will die if parsestring
428           or parsefile has been called previously for this instance.
429
430       is_defaulted(ATTNAME)
431           NO LONGER WORKS. To find out if an attribute is defaulted please
432           use the specified_attr method.
433
434       specified_attr
435           When the start handler receives lists of attributes and values, the
436           non-defaulted (i.e. explicitly specified) attributes occur in the
437           list first. This method returns the number of specified items in
438           the list.  So if this number is equal to the length of the list,
439           there were no defaulted values. Otherwise the number points to the
440           index of the first defaulted attribute name.
441
442       finish
443           Unsets all handlers (including internal ones that set context), but
444           expat continues parsing to the end of the document or until it
445           finds an error.  It should finish up a lot faster than with the
446           handlers set.
447
448       release
449           There are data structures used by XML::Parser::Expat that have
450           circular references. This means that these structures will never be
451           garbage collected unless these references are explicitly broken.
452           Calling this method breaks those references (and makes the instance
453           unusable.)
454
455           Normally, higher level calls handle this for you, but if you are
456           using XML::Parser::Expat directly, then it's your responsibility to
457           call it.
458
459   XML::Parser::ContentModel Methods
460       The element declaration handlers are passed objects of this class as
461       the content model of the element declaration. They also represent
462       content particles, components of a content model.
463
464       When referred to as a string, these objects are automagicly converted
465       to a string representation of the model (or content particle).
466
467       isempty
468           This method returns true if the object is "EMPTY", false otherwise.
469
470       isany
471           This method returns true if the object is "ANY", false otherwise.
472
473       ismixed
474           This method returns true if the object is "(#PCDATA)" or
475           "(#PCDATA|...)*", false otherwise.
476
477       isname
478           This method returns if the object is an element name.
479
480       ischoice
481           This method returns true if the object is a choice of content
482           particles.
483
484       isseq
485           This method returns true if the object is a sequence of content
486           particles.
487
488       quant
489           This method returns undef or a string representing the quantifier
490           ('?', '*', '+') associated with the model or particle.
491
492       children
493           This method returns undef or (for mixed, choice, and sequence
494           types) an array of component content particles. There will always
495           be at least one component for choices and sequences, but for a
496           mixed content model of pure PCDATA, "(#PCDATA)", then an undef is
497           returned.
498
499   XML::Parser::ExpatNB Methods
500       The class XML::Parser::ExpatNB is a subclass of XML::Parser::Expat used
501       for non-blocking access to the expat library. It does not support the
502       parse, parsestring, or parsefile methods, but it does have these
503       additional methods:
504
505       parse_more(DATA)
506           Feed expat more text to munch on.
507
508       parse_done
509           Tell expat that it's gotten the whole document.
510

FUNCTIONS

512       XML::Parser::Expat::load_encoding(ENCODING)
513           Load an external encoding. ENCODING is either the name of an
514           encoding or the name of a file. The basename is converted to
515           lowercase and a '.enc' extension is appended unless there's one
516           already there. Then, unless it's an absolute pathname (i.e. begins
517           with '/'), the first file by that name discovered in the
518           @Encoding_Path path list is used.
519
520           The encoding in the file is loaded and kept in the %Encoding_Table
521           table. Earlier encodings of the same name are replaced.
522
523           This function is automatically called by expat when it encounters
524           an encoding it doesn't know about. Expat shouldn't call this twice
525           for the same encoding name. The only reason users should use this
526           function is to explicitly load an encoding not contained in the
527           @Encoding_Path list.
528

AUTHORS

530       Larry Wall <larry@wall.org> wrote version 1.0.
531
532       Clark Cooper <coopercc@netheaven.com> picked up support, changed the
533       API for this version (2.x), provided documentation, and added some
534       standard package features.
535
536
537
538perl v5.30.1                      2020-01-30                          Expat(3)
Impressum