1expat(n) expat(n)
2
3
4
5______________________________________________________________________________
6
8 expat - Creates an instance of an expat parser object
9
11 package require tdom
12
13 expat ?parsername? ?-namespace? ?arg arg ..
14
15 xml::parser ?parsername? ?-namespace? ?arg arg ..
16_________________________________________________________________
17
19 The parser created with expat or xml::parser (which is just another
20 name for the same command in an own namespace) are able to parse any
21 kind of well-formed XML. The parsers are stream oriented XML parser.
22 This means that you register handler scripts with the parser prior to
23 starting the parse. These handler scripts are called when the parser
24 discovers the associated structures in the document being parsed. A
25 start tag is an example of the kind of structures for which you may
26 register a handler script.
27
28 The parsers do not validate the XML document. They do parse the inter‐
29 nal DTD and, at request, external DTD and external entities, if you
30 resolve the identifier of the external entities with the -externalenti‐
31 tycommand script (see there).
32
33 Additionly, the Tcl extension code that implements this command pro‐
34 vides an API for adding C level coded handlers. Up to now, there exists
35 the parser extension command "tdom". The handler set installed by this
36 extension build an in memory "tDOM" DOM tree, while the parser is pars‐
37 ing the input.
38
39 It is possible to register an arbitrary amount of different handler
40 scripts and C level handlers for most of the events. If the event
41 occurs, they are called in turn.
42
44 -namespace
45
46 Enables namespace parsing. You must use this option while creat‐
47 ing the parser with the expat or xml::parser command. You can't
48 enable (nor disable) namespace parsing with <parserobj> config‐
49 ure ....
50
51 -final boolean
52
53 This option indicates whether the document data next presented
54 to the parse method is the final part of the document. A value
55 of "0" indicates that more data is expected. A value of "1"
56 indicates that no more is expected. The default value is "1".
57
58 If this option is set to "0" then the parser will not report
59 certain errors if the XML data is not well-formed upon end of
60 input, such as unclosed or unbalanced start or end tags. Instead
61 some data may be saved by the parser until the next call to the
62 parse method, thus delaying the reporting of some of the data.
63
64 If this option is set to "1" then documents which are not well-
65 formed upon end of input will generate an error.
66
67 -baseurl url
68
69 Reports the base url of the document to the parser.
70
71 -elementstartcommand script
72
73 Specifies a Tcl command to associate with the start tag of an
74 element. The actual command consists of this option followed by
75 at least two arguments: the element type name and the attribute
76 list.
77
78 The attribute list is a Tcl list consisting of name/value pairs,
79 suitable for passing to the array set Tcl command.
80
81 Example:
82
83
84 proc HandleStart {name attlist} {
85 puts stderr "Element start ==> $name has attributes $attlist"
86 }
87
88 $parser configure -elementstartcommand HandleStart
89
90 $parser parse {<test id="123"></test>}
91
92
93 This would result in the following command being invoked:
94
95
96 HandleStart text {id 123}
97
98 -elementendcommand script
99
100 Specifies a Tcl command to associate with the end tag of an ele‐
101 ment. The actual command consists of this option followed by at
102 least one argument: the element type name. In addition, if the
103 -reportempty option is set then the command may be invoked with
104 the -empty configuration option to indicate whether it is an
105 empty element. See the description of the -reportempty option
106 for an example.
107
108 Example:
109
110
111 proc HandleEnd {name} {
112 puts stderr "Element end ==> $name"
113 }
114
115 $parser configure -elementendcommand HandleEnd
116
117 $parser parse {<test id="123"></test>}
118
119
120 This would result in the following command being invoked:
121
122
123
124 HandleEnd test
125
126
127 -characterdatacommand script
128
129 Specifies a Tcl command to associate with character data in the
130 document, ie. text. The actual command consists of this option
131 followed by one argument: the text.
132
133 It is not guaranteed that character data will be passed to the
134 application in a single call to this command. That is, the
135 application should be prepared to receive multiple invocations
136 of this callback with no intervening callbacks from other fea‐
137 tures.
138
139 Example:
140
141
142
143 proc HandleText {data} {
144 puts stderr "Character data ==> $data"
145 }
146
147 $parser configure -characterdatacommand HandleText
148
149 $parser parse {<test>this is a test document</test>}
150
151
152 This would result in the following command being invoked:
153
154
155
156 HandleText {this is a test document}
157
158 -processinginstructioncommand script
159
160 Specifies a Tcl command to associate with processing instruc‐
161 tions in the document. The actual command consists of this
162 option followed by two arguments: the PI target and the PI data.
163
164 Example:
165
166
167
168 proc HandlePI {target data} {
169 puts stderr "Processing instruction ==> $target $data"
170 }
171
172 $parser configure -processinginstructioncommand HandlePI
173
174 $parser parse {<test><?special this is a processing instruction?></test>}
175
176
177 This would result in the following command being invoked:
178
179
180
181
182 HandlePI special {this is a processing instruction}
183
184
185 -notationdeclcommand script
186
187 Specifies a Tcl command to associate with notation declaration
188 in the document. The actual command consists of this option fol‐
189 lowed by four arguments: the notation name, the base uri of the
190 document (this means, whatever was set by the -baseurl option),
191 the system identifier and the public identifier. The notation
192 name is never empty, the other arguments may be.
193
194 -externalentitycommand script
195
196 Specifies a Tcl command to associate with references to external
197 entities in the document. The actual command consists of this
198 option followed by three arguments: the base uri, the system
199 identifier of the entity and the public identifier of the
200 entity. The base uri and the public identifier may be the empty
201 list.
202
203 This handler script has to return a tcl list consisting of three
204 elements. The first element of this list signals, how the exter‐
205 nal entity is returned to the processor. At the moment, the
206 three allowed types are "string", "channel" and "filename". The
207 second element of the list has to be the (absolute) base URI of
208 the external entity to be parsed. The third element of the list
209 are data, either the already read data out of the external
210 entity as string in the case of type "string", or the name of a
211 tcl channel, in the case of type "channel", or the path to the
212 external entity to be read in case of type "filename". Behind
213 the scene, the external entity referenced by the returned Tcl
214 channel, string or file name will be parsed with an expat exter‐
215 nal entity parser with the same handler sets as the main parser.
216 If parsing of the external entity fails, the whole parsing is
217 stopped with an error message. If a Tcl command registered as
218 externalentitycommand isn't able to resolve an external entity
219 it is allowed to return TCL_CONTINUE. In this case, the wrapper
220 give the next registered externalentitycommand a try. If no
221 externalentitycommand is able to handle the external entity
222 parsing stops with an error.
223
224 Example:
225
226
227
228 proc externalEntityRefHandler {base systemId publicId} {
229 if {![regexp {^[a-zA-Z]+:/} $systemId]} {
230 regsub {^[a-zA-Z]+:} $base {} base
231 set basedir [file dirname $base]
232 set systemId "[set basedir]/[set systemId]"
233 } else {
234 regsub {^[a-zA-Z]+:} $systemId systemId
235 }
236 if {[catch {set fd [open $systemId]}]} {
237 return -code error \
238 -errorinfo "Failed to open external entity $systemId"
239 }
240 return [list channel $systemId $fd]
241 }
242
243 set parser [expat -externalentitycommand externalEntityRefHandler \
244 -baseurl "file:///local/doc/doc.xml" \
245 -paramentityparsing notstandalone]
246 $parser parse {<?xml version='1.0'?>
247 <!DOCTYPE test SYSTEM "test.dtd">
248 <test/>}
249
250
251 This would result in the following command being invoked:
252
253
254
255
256 externalEntityRefHandler file:///local/doc/doc.xml test.dtd {}
257
258
259 External entities are only tried to resolve via this handler
260 script, if necessary. This means, external parameter entities
261 triggers this handler only, if -paramentityparsing is used with
262 argument "always" or if -paramentityparsing is used with argu‐
263 ment "notstandalone" and the document isn't marked as stand‐
264 alone.
265
266 -unknownencodingcommand script
267
268 Not implemented at Tcl level.
269
270 -startnamespacedeclcommand script
271
272 Specifies a Tcl command to associate with start scope of names‐
273 pace declarations in the document. The actual command consists
274 of this option followed by two arguments: the namespace prefix
275 and the namespace URI. For an xmlns attribute, prefix will be
276 the empty list. For an xmlns="" attribute, uri will be the
277 empty list. The call to the start and end element handlers occur
278 between the calls to the start and end namespace declaration
279 handlers.
280
281 -endnamespacedeclcommand script
282
283 Specifies a Tcl command to associate with end scope of namespace
284 declarations in the document. The actual command consists of
285 this option followed by the namespace prefix as argument. In
286 case of an xmlns attribute, prefix will be the empty list. The
287 call to the start and end element handlers occur between the
288 calls to the start and end namespace declaration handlers.
289
290 -commentcommand script
291
292 Specifies a Tcl command to associate with comments in the docu‐
293 ment. The actual command consists of this option followed by one
294 argument: the comment data.
295
296 Example:
297
298
299
300
301 proc HandleComment {data} {
302 puts stderr "Comment ==> $data"
303 }
304
305 $parser configure -commentcommand HandleComment
306
307 $parser parse {<test><!-- this is <obviously> a comment --></test>}
308
309
310 This would result in the following command being invoked:
311
312
313
314
315 HandleComment { this is <obviously> a comment }
316
317
318 -notstandalonecommand script
319
320 This Tcl command is called, if the document is not standalone
321 (it has an external subset or a reference to a parameter entity,
322 but does not have standalone="yes"). It is called with no addi‐
323 tional arguments.
324
325 -startcdatasectioncommand script
326
327 Specifies a Tcl command to associate with the start of a CDATA
328 section. It is called with no additional arguments.
329
330 -endcdatasectioncommand script
331
332 Specifies a Tcl command to associate with the end of a CDATA
333 section. It is called with no additional arguments.
334
335 -elementdeclcommand script
336
337 Specifies a Tcl command to associate with element declarations.
338 The actual command consists of this option followed by two argu‐
339 ments: the name of the element and the content model. The con‐
340 tent model arg is a tcl list of four elements. The first list
341 element specifies the type of the XML element; the six different
342 possible types are reported as "MIXED", "NAME", "EMPTY",
343 "CHOICE", "SEQ" or "ANY". The second list element reports the
344 quantifier to the content model in XML Syntax ("?", "*" or "+")
345 or is the empty list. If the type is "MIXED", then the quanti‐
346 fier will be "{}", indicating an PCDATA only element, or "*",
347 with the allowed elements to intermix with PCDATA as tcl list as
348 the fourth argument. If the type is "NAME", the name is the
349 third arg; otherwise the third argument is the empty list. If
350 the type is "CHOICE" or "SEQ" the fourth argument will contain a
351 list of content models build like this one. The "EMPTY", "ANY",
352 and "MIXED" types will only occur at top level.
353
354 Examples:
355
356
357
358
359 proc elDeclHandler {name content} {
360 puts "$name $content"
361 }
362
363 set parser [expat -elementdeclcommand elDeclHandler]
364 $parser parse {<?xml version='1.0'?>
365 <!DOCTYPE test [
366 <!ELEMENT test (#PCDATA)>
367 ]>
368 <test>foo</test>}
369
370
371 This would result in the following command being invoked:
372
373
374
375
376 test {MIXED {} {} {}}
377
378 $parser reset
379 $parser parse {<?xml version='1.0'?>
380 <!DOCTYPE test [
381 <!ELEMENT test (a|b)>
382 ]>
383 <test><a/></test>}
384
385
386 This would result in the following command being invoked:
387
388
389
390
391 elDeclHandler test {CHOICE {} {} {{NAME {} a {}} {NAME {} b {}}}}
392
393
394 -attlistdeclcommand script
395
396 Specifies a Tcl command to associate with attlist declarations.
397 The actual command consists of this option followed by five
398 arguments. The Attlist declaration handler is called for *each*
399 attribute. So a single Attlist declaration with multiple
400 attributes declared will generate multiple calls to this han‐
401 dler. The arguments are the element name this attribute belongs
402 to, the name of the attribute, the type of the attribute, the
403 default value (may be the empty list) and a required flag. If
404 this flag is true and the default value is not the empty list,
405 then this is a "#FIXED" default.
406
407 Example:
408
409
410
411
412 proc attlistHandler {elname name type default isRequired} {
413 puts "$elname $name $type $default $isRequired"
414 }
415
416 set parser [expat -attlistdeclcommand attlistHandler]
417 $parser parse {<?xml version='1.0'?>
418 <!DOCTYPE test [
419 <!ELEMENT test EMPTY>
420 <!ATTLIST test
421 id ID #REQUIRED
422 name CDATA #IMPLIED>
423 ]>
424 <test/>}
425
426
427 This would result in the following commands being invoked:
428
429
430
431
432 attlistHandler test id ID {} 1
433 attlistHandler test name CDATA {} 0
434
435
436 -startdoctypedeclcommand script
437
438 Specifies a Tcl command to associate with the start of the DOC‐
439 TYPE declaration. This command is called before any DTD or
440 internal subset is parsed. The actual command consists of this
441 option followed by four arguments: the doctype name, the system
442 identifier, the public identifier and a boolean, that shows if
443 the DOCTYPE has an internal subset.
444
445 -enddoctypedeclcommand script
446
447 Specifies a Tcl command to associate with the end of the DOCTYPE
448 declaration. This command is called after processing any exter‐
449 nal subset. It is called with no additional arguments.
450
451 -paramentityparsing never|notstandalone|always
452
453 "never" disables expansion of parameter entities, "always"
454 expands always and "notstandalone" only, if the document isn't
455 "standalone='no'". The default ist "never"
456
457 -entitydeclcommand script
458
459 Specifies a Tcl command to associate with any entity declara‐
460 tion. The actual command consists of this option followed by
461 seven arguments: the entity name, a boolean identifying parame‐
462 ter entities, the value of the entity, the base uri, the system
463 identifier, the public identifier and the notation name. Accord‐
464 ing to the type of entity declaration some of this arguments may
465 be the empty list.
466
467 -ignorewhitecdata boolean
468
469 If this flag is set, element content which contain only white‐
470 spaces isn't reported with the -characterdatacommand.
471
472 -ignorewhitespace boolean
473 Another name for -ignorewhitecdata; see there.
474
475 -handlerset name
476
477 This option sets the Tcl handler set scope for the configure
478 options. Any option value pair following this option in the same
479 call to the parser are modifying the named Tcl handler set. If
480 you don't use this option, you are modifying the default Tcl
481 handler set, named "default".
482
483 -noexpand boolean
484
485 Normally, the parser will try to expand references to entities
486 defined in the internal subset. If this option is set to a true
487 value this entities are not expanded, but reported literal via
488 the default handler. Warning: If you set this option to true and
489 doesn't install a default handler (with the -defaultcommand
490 option) for every handler set of the parser all internal enti‐
491 ties are silent lost for the handler sets without a default han‐
492 dler.
493
494 -useForeignDTD <boolen>
495 If <boolen> is true and the document does not have an external
496 subset, the parser will call the -externalentitycommand script
497 with empty values for the systemId and publicID arguments. This
498 option must be set, before the first piece of data is parsed.
499 Setting this option, after the parsing has started has no
500 effect. The default is not to use a foreign DTD. The default is
501 restored, after reseting the parser. Pleace notice, that a
502 -paramentityparsing value of "never" (which is the default) sup‐
503 presses any call to the -externalentitycommand script. Pleace
504 notice, that, if the document also doesn't have an internal sub‐
505 set, the -startdoctypedeclcommand and enddoctypedeclcommand
506 scripts, if set, are not called.
507
508 COMMAND METHODS
509 parser configure option value ?option value?
510
511
512 Sets configuration options for the parser. Every command option,
513 except -namespace can be set or modified with this method.
514
515 parser cget ?-handlerset name? option
516
517
518 Return the current configuration value option for the parser.
519
520 If the -handlerset option is used, the configuration for the
521 named handler set is returned.
522
523 parser free
524
525
526 Deletes the parser and the parser command. A parser cannot be
527 freed from within one of its handler callbacks (neither directly
528 nor indirectly) and will raise a tcl error in this case.
529
530 parser get -specifiedattributecount|-idattributeindex|-currentbyte‐
531 count|-currentlinenumber|-currentcolumnnumber|-currentbyteindex
532
533
534 -specifiedattributecount
535
536 Returns the number of the attribute/value pairs passed in
537 last call to the elementstartcommand that were specified
538 in the start-tag rather than defaulted. Each
539 attribute/value pair counts as 2; thus this corresponds
540 to an index into the attribute list passed to the ele‐
541 mentstartcommand.
542
543 -idattributeindex
544
545 Returns the index of the ID attribute passed in the last
546 call to XML_StartElementHandler, or -1 if there is no ID
547 attribute. Each attribute/value pair counts as 2; thus
548 this corresponds to an index into the attributes list
549 passed to the elementstartcommand.
550
551 -currentbytecount
552
553 Return the number of bytes in the current event. Returns
554 0 if the event is in an internal entity.
555
556 -currentlinenumber
557
558 Returns the line number of the current parse location.
559
560 -currentcolumnnumber
561
562 Returns the column number of the current parse location.
563
564 -currentbyteindex
565
566 Returns the byte index of the current parse location.
567
568 Only one value may be requested at a time.
569
570 parser parse data
571
572
573 Parses the XML string data. The event callback scripts will be
574 called, as there triggering events happens. This method cannot
575 be used from within a callback (neither directly nor indirectly)
576 of the parser to be used and will raise an error in this case.
577
578 parser parsechannel channelID
579
580
581 Reads the XML data out of the tcl channel channelID (starting at
582 the current access position, without any seek) up to the end of
583 file condition and parses that data. The channel encoding is
584 respected. Use the helper proc tDOM::xmlOpenFile out of the tDOM
585 script library to open a file, if you want to use this method.
586 This method cannot be used from within a callback (neither
587 directly nor indirectly) of the parser to be used and will raise
588 an error in this case.
589
590 parser parsefile filename
591
592
593 Reads the XML data directly out of the file with the filename
594 filename and parses that data. This is done with low level file
595 operations. The XML data must be in US-ASCII, ISO-8859-1, UTF-8
596 or UTF-16 encoding. If applicable, this is the fastest way, to
597 parse XML data. This method cannot be used from within a call‐
598 back (neither directly nor indirectly) of the parser to be used
599 and will raise an error in this case.
600
601 parser reset
602
603
604 Resets the parser in preparation for parsing another document. A
605 parser cannot be reseted from within one of its handler call‐
606 backs (neither directly nor indirectly) and will raise a tcl
607 error in this cases.
608
610 A script invoked for any of the parser callback commands, such as -ele‐
611 mentstartcommand, -elementendcommand, etc, may return an error code
612 other than "ok" or "error". All callbacks may in addition return
613 "break" or "continue".
614
615 If a callback script returns an "error" error code then processing of
616 the document is terminated and the error is propagated in the usual
617 fashion.
618
619 If a callback script returns a "break" error code then all further pro‐
620 cessing of every handler script out of this Tcl handler set is sup‐
621 pressed for the further parsing. This does not influence any other han‐
622 dler set.
623
624 If a callback script returns a "continue" error code then processing of
625 the current element, and its children, ceases for every handler script
626 out of this Tcl handler set and processing continues with the next
627 (sibling) element. This does not influence any other handler set.
628
630 expatapi, tdom
631
633 SAX
634
635
636
637Tcl expat(n)