1expat(n)                                                              expat(n)
2
3
4
5______________________________________________________________________________
6

NAME

8       expat - Creates an instance of an expat parser object
9

SYNOPSIS

11       package require tdom
12
13       expat ?parsername? ?-namespace? ?arg arg ..
14
15       xml::parser ?parsername? ?-namespace? ?arg arg ..
16_________________________________________________________________
17

DESCRIPTION

19       The  parser  created  with  expat or xml::parser (which is just another
20       name for the same command in an own namespace) are able  to  parse  any
21       kind  of  well-formed  XML. The parsers are stream oriented XML parser.
22       This means that you register handler scripts with the parser  prior  to
23       starting  the  parse.  These handler scripts are called when the parser
24       discovers the associated structures in the document  being  parsed.   A
25       start  tag  is  an  example of the kind of structures for which you may
26       register a handler script.
27
28       The parsers do not validate the XML document. They do parse the  inter‐
29       nal  DTD  and,  at  request, external DTD and external entities, if you
30       resolve the identifier of the external entities with the -externalenti‐
31       tycommand script (see there).
32
33       Additionly,  the  Tcl  extension code that implements this command pro‐
34       vides an API for adding C level coded handlers. Up to now, there exists
35       the  parser extension command "tdom". The handler set installed by this
36       extension build an in memory "tDOM" DOM tree, while the parser is pars‐
37       ing the input.
38
39       It  is  possible  to  register an arbitrary amount of different handler
40       scripts and C level handlers for most  of  the  events.  If  the  event
41       occurs, they are called in turn.
42

COMMAND OPTIONS

44       -namespace
45
46              Enables namespace parsing. You must use this option while creat‐
47              ing the parser with the expat or xml::parser command. You  can't
48              enable  (nor disable) namespace parsing with <parserobj> config‐
49              ure ....
50
51       -final  boolean
52
53              This option indicates whether the document data  next  presented
54              to  the  parse method is the final part of the document. A value
55              of "0" indicates that more data is  expected.  A  value  of  "1"
56              indicates that no more is expected.  The default value is "1".
57
58              If  this  option  is  set to "0" then the parser will not report
59              certain errors if the XML data is not well-formed  upon  end  of
60              input, such as unclosed or unbalanced start or end tags. Instead
61              some data may be saved by the parser until the next call to  the
62              parse method, thus delaying the reporting of some of the data.
63
64              If  this option is set to "1" then documents which are not well-
65              formed upon end of input will generate an error.
66
67       -baseurl  url
68
69              Reports the base url of the document to the parser.
70
71       -elementstartcommand  script
72
73              Specifies a Tcl command to associate with the start  tag  of  an
74              element.  The actual command consists of this option followed by
75              at least two arguments: the element type name and the  attribute
76              list.
77
78              The attribute list is a Tcl list consisting of name/value pairs,
79              suitable for passing to the array set Tcl command.
80
81              Example:
82
83
84                     proc HandleStart {name attlist} {
85                         puts stderr "Element start ==> $name has attributes $attlist"
86                     }
87
88                     $parser configure -elementstartcommand HandleStart
89
90                     $parser parse {<test id="123"></test>}
91
92
93              This would result in the following command being invoked:
94
95
96                     HandleStart text {id 123}
97
98       -elementendcommand  script
99
100              Specifies a Tcl command to associate with the end tag of an ele‐
101              ment.  The actual command consists of this option followed by at
102              least one argument: the element type name. In addition,  if  the
103              -reportempty  option is set then the command may be invoked with
104              the -empty configuration option to indicate  whether  it  is  an
105              empty  element.  See  the description of the -reportempty option
106              for an example.
107
108              Example:
109
110
111                     proc HandleEnd {name} {
112                         puts stderr "Element end ==> $name"
113                     }
114
115                     $parser configure -elementendcommand HandleEnd
116
117                     $parser parse {<test id="123"></test>}
118
119
120              This would result in the following command being invoked:
121
122
123
124                     HandleEnd test
125
126
127       -characterdatacommand  script
128
129              Specifies a Tcl command to associate with character data in  the
130              document,  ie.  text. The actual command consists of this option
131              followed by one argument: the text.
132
133              It is not guaranteed that character data will be passed  to  the
134              application  in  a  single  call  to  this command. That is, the
135              application should be prepared to receive  multiple  invocations
136              of  this  callback with no intervening callbacks from other fea‐
137              tures.
138
139              Example:
140
141
142
143                     proc HandleText {data} {
144                         puts stderr "Character data ==> $data"
145                     }
146
147                     $parser configure -characterdatacommand HandleText
148
149                     $parser parse {<test>this is a test document</test>}
150
151
152              This would result in the following command being invoked:
153
154
155
156                     HandleText {this is a test document}
157
158       -processinginstructioncommand  script
159
160              Specifies a Tcl command to associate  with  processing  instruc‐
161              tions  in  the  document.  The  actual  command consists of this
162              option followed by two arguments: the PI target and the PI data.
163
164              Example:
165
166
167
168                     proc HandlePI {target data} {
169                         puts stderr "Processing instruction ==> $target $data"
170                     }
171
172                     $parser configure -processinginstructioncommand HandlePI
173
174                     $parser parse {<test><?special this is a processing instruction?></test>}
175
176
177              This would result in the following command being invoked:
178
179
180
181
182                     HandlePI special {this is a processing instruction}
183
184
185        -notationdeclcommand  script
186
187              Specifies a Tcl command to associate with  notation  declaration
188              in the document. The actual command consists of this option fol‐
189              lowed by four arguments: the notation name, the base uri of  the
190              document  (this means, whatever was set by the -baseurl option),
191              the system identifier and the public  identifier.  The  notation
192              name is never empty, the other arguments may be.
193
194        -externalentitycommand  script
195
196              Specifies a Tcl command to associate with references to external
197              entities in the document. The actual command  consists  of  this
198              option  followed  by  three  arguments: the base uri, the system
199              identifier of the  entity  and  the  public  identifier  of  the
200              entity.  The base uri and the public identifier may be the empty
201              list.
202
203              This handler script has to return a tcl list consisting of three
204              elements. The first element of this list signals, how the exter‐
205              nal entity is returned to the  processor.  At  the  moment,  the
206              three  allowed types are "string", "channel" and "filename". The
207              second element of the list has to be the (absolute) base URI  of
208              the external entity to be parsed.  The third element of the list
209              are data, either the already  read  data  out  of  the  external
210              entity  as string in the case of type "string", or the name of a
211              tcl channel, in the case of type "channel", or the path  to  the
212              external  entity  to  be read in case of type "filename". Behind
213              the scene, the external entity referenced by  the  returned  Tcl
214              channel, string or file name will be parsed with an expat exter‐
215              nal entity parser with the same handler sets as the main parser.
216              If  parsing  of  the external entity fails, the whole parsing is
217              stopped with an error message. If a Tcl  command  registered  as
218              externalentitycommand  isn't  able to resolve an external entity
219              it is allowed to return TCL_CONTINUE. In this case, the  wrapper
220              give  the  next  registered  externalentitycommand  a try. If no
221              externalentitycommand is able  to  handle  the  external  entity
222              parsing stops with an error.
223
224              Example:
225
226
227
228                     proc externalEntityRefHandler {base systemId publicId} {
229                         if {![regexp {^[a-zA-Z]+:/} $systemId]}  {
230                             regsub {^[a-zA-Z]+:} $base {} base
231                             set basedir [file dirname $base]
232                             set systemId "[set basedir]/[set systemId]"
233                         } else {
234                             regsub {^[a-zA-Z]+:} $systemId systemId
235                         }
236                         if {[catch {set fd [open $systemId]}]} {
237                             return -code error \
238                                     -errorinfo "Failed to open external entity $systemId"
239                         }
240                         return [list channel $systemId $fd]
241                     }
242
243                     set parser [expat -externalentitycommand externalEntityRefHandler \
244                                       -baseurl "file:///local/doc/doc.xml" \
245                                       -paramentityparsing notstandalone]
246                     $parser parse {<?xml version='1.0'?>
247                     <!DOCTYPE test SYSTEM "test.dtd">
248                     <test/>}
249
250
251              This would result in the following command being invoked:
252
253
254
255
256                     externalEntityRefHandler file:///local/doc/doc.xml test.dtd {}
257
258
259              External  entities  are  only  tried to resolve via this handler
260              script, if necessary. This means,  external  parameter  entities
261              triggers  this handler only, if -paramentityparsing is used with
262              argument "always" or if -paramentityparsing is used  with  argu‐
263              ment  "notstandalone"  and  the  document isn't marked as stand‐
264              alone.
265
266        -unknownencodingcommand  script
267
268              Not implemented at Tcl level.
269
270       -startnamespacedeclcommand  script
271
272              Specifies a Tcl command to associate with start scope of  names‐
273              pace  declarations  in the document. The actual command consists
274              of this option followed by two arguments: the  namespace  prefix
275              and  the  namespace  URI. For an xmlns attribute, prefix will be
276              the empty list.  For an xmlns=""  attribute,  uri  will  be  the
277              empty list. The call to the start and end element handlers occur
278              between the calls to the start  and  end  namespace  declaration
279              handlers.
280
281        -endnamespacedeclcommand  script
282
283              Specifies a Tcl command to associate with end scope of namespace
284              declarations in the document. The  actual  command  consists  of
285              this  option  followed  by  the namespace prefix as argument. In
286              case of an xmlns attribute, prefix will be the empty  list.  The
287              call  to  the  start  and end element handlers occur between the
288              calls to the start and end namespace declaration handlers.
289
290        -commentcommand  script
291
292              Specifies a Tcl command to associate with comments in the  docu‐
293              ment. The actual command consists of this option followed by one
294              argument: the comment data.
295
296              Example:
297
298
299
300
301                     proc HandleComment {data} {
302                         puts stderr "Comment ==> $data"
303                     }
304
305                     $parser configure -commentcommand HandleComment
306
307                     $parser parse {<test><!-- this is <obviously> a comment --></test>}
308
309
310              This would result in the following command being invoked:
311
312
313
314
315                     HandleComment { this is <obviously> a comment }
316
317
318        -notstandalonecommand  script
319
320              This Tcl command is called, if the document  is  not  standalone
321              (it has an external subset or a reference to a parameter entity,
322              but does not have standalone="yes"). It is called with no  addi‐
323              tional arguments.
324
325        -startcdatasectioncommand  script
326
327              Specifies  a  Tcl command to associate with the start of a CDATA
328              section.  It is called with no additional arguments.
329
330        -endcdatasectioncommand  script
331
332              Specifies a Tcl command to associate with the  end  of  a  CDATA
333              section.  It is called with no additional arguments.
334
335        -elementdeclcommand  script
336
337              Specifies  a Tcl command to associate with element declarations.
338              The actual command consists of this option followed by two argu‐
339              ments:  the  name of the element and the content model. The con‐
340              tent model arg is a tcl list of four elements.  The  first  list
341              element specifies the type of the XML element; the six different
342              possible  types  are  reported  as  "MIXED",  "NAME",   "EMPTY",
343              "CHOICE",  "SEQ"  or  "ANY". The second list element reports the
344              quantifier to the content model in XML Syntax ("?", "*" or  "+")
345              or  is  the empty list. If the type is "MIXED", then the quanti‐
346              fier will be "{}", indicating an PCDATA only  element,  or  "*",
347              with the allowed elements to intermix with PCDATA as tcl list as
348              the fourth argument. If the type is  "NAME",  the  name  is  the
349              third  arg;  otherwise  the third argument is the empty list. If
350              the type is "CHOICE" or "SEQ" the fourth argument will contain a
351              list  of content models build like this one. The "EMPTY", "ANY",
352              and "MIXED" types will only occur at top level.
353
354              Examples:
355
356
357
358
359                     proc elDeclHandler {name content} {
360                          puts "$name $content"
361                     }
362
363                     set parser [expat -elementdeclcommand elDeclHandler]
364                     $parser parse {<?xml version='1.0'?>
365                     <!DOCTYPE test [
366                     <!ELEMENT test (#PCDATA)>
367                     ]>
368                     <test>foo</test>}
369
370
371              This would result in the following command being invoked:
372
373
374
375
376                     test {MIXED {} {} {}}
377
378                     $parser reset
379                     $parser parse {<?xml version='1.0'?>
380                     <!DOCTYPE test [
381                     <!ELEMENT test (a|b)>
382                     ]>
383                     <test><a/></test>}
384
385
386              This would result in the following command being invoked:
387
388
389
390
391                     elDeclHandler test {CHOICE {} {} {{NAME {} a {}} {NAME {} b {}}}}
392
393
394        -attlistdeclcommand  script
395
396              Specifies a Tcl command to associate with attlist  declarations.
397              The  actual  command  consists  of  this option followed by five
398              arguments.  The Attlist declaration handler is called for *each*
399              attribute.   So  a  single  Attlist  declaration  with  multiple
400              attributes declared will generate multiple calls  to  this  han‐
401              dler.  The arguments are the element name this attribute belongs
402              to, the name of the attribute, the type of  the  attribute,  the
403              default  value  (may  be the empty list) and a required flag. If
404              this flag is true and the default value is not the  empty  list,
405              then this is a "#FIXED" default.
406
407              Example:
408
409
410
411
412                     proc attlistHandler {elname name type default isRequired} {
413                         puts "$elname $name $type $default $isRequired"
414                     }
415
416                     set parser [expat -attlistdeclcommand attlistHandler]
417                     $parser parse {<?xml version='1.0'?>
418                     <!DOCTYPE test [
419                     <!ELEMENT test EMPTY>
420                     <!ATTLIST test
421                               id      ID      #REQUIRED
422                               name    CDATA   #IMPLIED>
423                     ]>
424                     <test/>}
425
426
427              This would result in the following commands being invoked:
428
429
430
431
432                     attlistHandler test id ID {} 1
433                     attlistHandler test name CDATA {} 0
434
435
436        -startdoctypedeclcommand  script
437
438              Specifies  a Tcl command to associate with the start of the DOC‐
439              TYPE declaration. This command  is  called  before  any  DTD  or
440              internal  subset is parsed.  The actual command consists of this
441              option followed by four arguments: the doctype name, the  system
442              identifier,  the  public identifier and a boolean, that shows if
443              the DOCTYPE has an internal subset.
444
445        -enddoctypedeclcommand  script
446
447              Specifies a Tcl command to associate with the end of the DOCTYPE
448              declaration.  This command is called after processing any exter‐
449              nal subset.  It is called with no additional arguments.
450
451        -paramentityparsing  never|notstandalone|always
452
453              "never"  disables  expansion  of  parameter  entities,  "always"
454              expands  always  and "notstandalone" only, if the document isn't
455              "standalone='no'". The default ist "never"
456
457        -entitydeclcommand  script
458
459              Specifies a Tcl command to associate with  any  entity  declara‐
460              tion.  The  actual  command  consists of this option followed by
461              seven arguments: the entity name, a boolean identifying  parame‐
462              ter  entities, the value of the entity, the base uri, the system
463              identifier, the public identifier and the notation name. Accord‐
464              ing to the type of entity declaration some of this arguments may
465              be the empty list.
466
467        -ignorewhitecdata  boolean
468
469              If this flag is set, element content which contain  only  white‐
470              spaces isn't reported with the -characterdatacommand.
471
472        -ignorewhitespace  boolean
473              Another name for  -ignorewhitecdata; see there.
474
475        -handlerset  name
476
477              This  option  sets  the  Tcl handler set scope for the configure
478              options. Any option value pair following this option in the same
479              call  to  the parser are modifying the named Tcl handler set. If
480              you don't use this option, you are  modifying  the  default  Tcl
481              handler set, named "default".
482
483        -noexpand  boolean
484
485              Normally,  the  parser will try to expand references to entities
486              defined in the internal subset. If this option is set to a  true
487              value  this  entities are not expanded, but reported literal via
488              the default handler. Warning: If you set this option to true and
489              doesn't  install  a  default  handler  (with the -defaultcommand
490              option) for every handler set of the parser all  internal  enti‐
491              ties are silent lost for the handler sets without a default han‐
492              dler.
493
494       -useForeignDTD  <boolen>
495              If <boolen> is true and the document does not have  an  external
496              subset,  the  parser will call the -externalentitycommand script
497              with empty values for the systemId and publicID arguments.  This
498              option  must  be  set, before the first piece of data is parsed.
499              Setting this option,  after  the  parsing  has  started  has  no
500              effect.  The default is not to use a foreign DTD. The default is
501              restored, after reseting  the  parser.  Pleace  notice,  that  a
502              -paramentityparsing value of "never" (which is the default) sup‐
503              presses any call to the  -externalentitycommand  script.  Pleace
504              notice, that, if the document also doesn't have an internal sub‐
505              set,  the  -startdoctypedeclcommand  and   enddoctypedeclcommand
506              scripts, if set, are not called.
507
508 COMMAND METHODS
509       parser configure option value ?option value?
510
511
512              Sets configuration options for the parser. Every command option,
513              except -namespace can be set or modified with this method.
514
515       parser cget ?-handlerset name? option
516
517
518              Return the current configuration value option for the parser.
519
520              If the -handlerset option is used,  the  configuration  for  the
521              named handler set is returned.
522
523       parser free
524
525
526              Deletes  the  parser  and the parser command. A parser cannot be
527              freed from within one of its handler callbacks (neither directly
528              nor indirectly) and will raise a tcl error in this case.
529
530       parser   get   -specifiedattributecount|-idattributeindex|-currentbyte‐
531       count|-currentlinenumber|-currentcolumnnumber|-currentbyteindex
532
533
534              -specifiedattributecount
535
536                     Returns the number of the attribute/value pairs passed in
537                     last  call to the elementstartcommand that were specified
538                     in   the   start-tag   rather   than   defaulted.    Each
539                     attribute/value  pair  counts as 2; thus this corresponds
540                     to an index into the attribute list passed  to  the  ele‐
541                     mentstartcommand.
542
543              -idattributeindex
544
545                     Returns  the index of the ID attribute passed in the last
546                     call to XML_StartElementHandler, or -1 if there is no  ID
547                     attribute.   Each  attribute/value pair counts as 2; thus
548                     this corresponds to an index  into  the  attributes  list
549                     passed to the elementstartcommand.
550
551              -currentbytecount
552
553                     Return the number of bytes in the current event.  Returns
554                     0 if the event is in an internal entity.
555
556              -currentlinenumber
557
558                     Returns the line number of the current parse location.
559
560              -currentcolumnnumber
561
562                     Returns the column number of the current parse location.
563
564              -currentbyteindex
565
566                     Returns the byte index of the current parse location.
567
568              Only one value may be requested at a time.
569
570       parser parse data
571
572
573              Parses the XML string data. The event callback scripts  will  be
574              called,  as  there triggering events happens. This method cannot
575              be used from within a callback (neither directly nor indirectly)
576              of the parser to be used and will raise an error in this case.
577
578       parser parsechannel channelID
579
580
581              Reads the XML data out of the tcl channel channelID (starting at
582              the current access position, without any seek) up to the end  of
583              file  condition  and  parses  that data. The channel encoding is
584              respected. Use the helper proc tDOM::xmlOpenFile out of the tDOM
585              script  library  to open a file, if you want to use this method.
586              This method cannot be  used  from  within  a  callback  (neither
587              directly nor indirectly) of the parser to be used and will raise
588              an error in this case.
589
590       parser parsefile filename
591
592
593              Reads the XML data directly out of the file  with  the  filename
594              filename  and parses that data. This is done with low level file
595              operations. The XML data must be in US-ASCII, ISO-8859-1,  UTF-8
596              or  UTF-16  encoding. If applicable, this is the fastest way, to
597              parse XML data. This method cannot be used from within  a  call‐
598              back  (neither directly nor indirectly) of the parser to be used
599              and will raise an error in this case.
600
601       parser reset
602
603
604              Resets the parser in preparation for parsing another document. A
605              parser  cannot  be  reseted from within one of its handler call‐
606              backs (neither directly nor indirectly) and  will  raise  a  tcl
607              error in this cases.
608

Callback Command Return Codes

610       A script invoked for any of the parser callback commands, such as -ele‐
611       mentstartcommand, -elementendcommand, etc, may  return  an  error  code
612       other  than  "ok"  or  "error".  All  callbacks  may in addition return
613       "break" or "continue".
614
615       If a callback script returns an "error" error code then  processing  of
616       the  document  is  terminated  and the error is propagated in the usual
617       fashion.
618
619       If a callback script returns a "break" error code then all further pro‐
620       cessing  of  every  handler  script out of this Tcl handler set is sup‐
621       pressed for the further parsing. This does not influence any other han‐
622       dler set.
623
624       If a callback script returns a "continue" error code then processing of
625       the current element, and its children, ceases for every handler  script
626       out  of  this  Tcl  handler  set and processing continues with the next
627       (sibling) element. This does not influence any other handler set.
628

SEE ALSO

630       expatapi, tdom
631

KEYWORDS

633       SAX
634
635
636
637Tcl                                                                   expat(n)
Impressum