1xsxp(n)                 Amazon S3 Web Service Utilities                xsxp(n)
2
3
4
5______________________________________________________________________________
6

NAME

8       xsxp - eXtremely Simple Xml Parser
9

SYNOPSIS

11       package require Tcl  8.4
12
13       package require xsxp  1
14
15       package require xml
16
17       xsxp::parse xml
18
19       xsxp::fetch pxml path ?part?
20
21       xsxp::fetchall pxml_list path ?part?
22
23       xsxp::only pxml tagname
24
25       xsxp::prettyprint pxml ?chan?
26
27______________________________________________________________________________
28

DESCRIPTION

30       This package provides a simple interface to parse XML into a pure-value
31       list.  It also provides accessor routines to pull out specific subtags,
32       not  unlike  DOM  access.   This package was written for and is used by
33       Darren New's Amazon S3 access package.
34
35       This is pretty lame, but I needed something like this for  S3,  and  at
36       the  time,  TclDOM  would  not work with the new 8.5 Tcl due to version
37       number problems.
38
39       In addition, this is a pure-value implementation. There is  no  garbage
40       to  clean up in the event of a thrown error, for example.  This simpli‐
41       fies the code for sufficiently small XML documents, which is what  Ama‐
42       zon's S3 guarantees.
43
44       Copyright  2006  Darren New. All Rights Reserved.  NO WARRANTIES OF ANY
45       TYPE ARE PROVIDED.  COPYING OR USE INDEMNIFIES THE AUTHOR IN ALL  WAYS.
46       This  software is licensed under essentially the same terms as Tcl. See
47       LICENSE.txt for the terms.
48

COMMANDS

50       The package implements five rather simple procedures.  One parses,  one
51       is  for  debugging, and the rest pull various parts of the parsed docu‐
52       ment out for processing.
53
54       xsxp::parse xml
55              This parses an XML document (using the standard xml tcllib  mod‐
56              ule  in  a SAX sort of way) and builds a data structure which it
57              returns if the parsing succeeded. The return value  is  referred
58              to herein as a "pxml", or "parsed xml". The list consists of two
59              or more elements:
60
61              •      The first element is the name of the tag.
62
63              •      The second element is  an  array-get  formatted  list  of
64                     key/value  pairs.  The  keys  are attribute names and the
65                     values are attribute values. This is  an  empty  list  if
66                     there are no attributes on the tag.
67
68              •      The  third  through  end elements are the children of the
69                     node, if any. Each child is, recursively, a pxml.
70
71              •      Note that if the zero'th element, i.e. the tag  name,  is
72                     "%PCDATA",  then  the  attributes  will  be empty and the
73                     third element will be the text of the element.  In  addi‐
74                     tion,  if  an element's contents consists only of PCDATA,
75                     it will have only one child, and all the PCDATA  will  be
76                     concatenated.  In  other  words, this parser works poorly
77                     for XML with elements that contain both  child  tags  and
78                     PCDATA.   Since  Amazon S3 does not do this (and for that
79                     matter most uses of XML where XML is a poor choice  don't
80                     do this), this is probably not a serious limitation.
81
82
83       xsxp::fetch pxml path ?part?
84              pxml  is  a parsed XML, as returned from xsxp::parse.  path is a
85              list of element tag names. Each element is the name of  a  child
86              to  look up, optionally followed by a hash ("#") and a string of
87              digits. An empty list or an initial empty element selects  pxml.
88              If  no hash sign is present, the behavior is as if "#0" had been
89              appended to that element. (In addition to a  list,  slashes  can
90              separate subparts where convenient.)
91
92              An element of path scans the children at the indicated level for
93              the n'th instance of a child whose tag matches the part  of  the
94              element  before the hash sign. If an element is simply "#"  fol‐
95              lowed by digits, that indexed child is selected,  regardless  of
96              the  tags in the children. Hence, an element of "#3" will always
97              select the fourth child of the node under consideration.
98
99              part defaults to "%ALL". It can be one of  the  following  case-
100              sensitive terms:
101
102              %ALL   returns the entire selected element.
103
104              %TAGNAME
105                     returns lindex 0 of the selected element.
106
107              %ATTRIBUTES
108                     returns index 1 of the selected element.
109
110              %CHILDREN
111                     returns lrange 2 through end of the selected element, re‐
112                     sulting in a list of elements being returned.
113
114              %PCDATA
115                     returns a concatenation of all the bodies of direct chil‐
116                     dren of this node whose tag is %PCDATA.  It throws an er‐
117                     ror if no such children are found. That is,  part=%PCDATA
118                     means  return  the textual content found in that node but
119                     not its children nodes.
120
121              %PCDATA?
122                     is like %PCDATA, but returns an empty string if no PCDATA
123                     is found.
124
125       For  example,  to fetch the first bold text from the fifth paragraph of
126       the body of your HTML file,
127
128              xsxp::fetch $pxml {body p#4 b} %PCDATA
129
130
131       xsxp::fetchall pxml_list path ?part?
132              This iterates over each PXML in pxml_list (which must be a  list
133              of  pxmls)  selecting the indicated path from it, building a new
134              list with the selected data, and returning that new list.
135
136              For example, pxml_list might be the %CHILDREN  of  a  particular
137              element,  and  the  path and part might select from each child a
138              sub-element in which we're interested.
139
140
141       xsxp::only pxml tagname
142              This iterates over the direct children of pxml and selects  only
143              those with tagname as their tag. Returns a list of matching ele‐
144              ments.
145
146
147       xsxp::prettyprint pxml ?chan?
148              This outputs to chan (default stdout) a  pretty-printed  version
149              of pxml.
150

BUGS, IDEAS, FEEDBACK

152       This  document,  and the package it describes, will undoubtedly contain
153       bugs and other problems.  Please report such in the category  amazon-s3
154       of  the Tcllib Trackers [http://core.tcl.tk/tcllib/reportlist].  Please
155       also report any ideas for enhancements you may have for either  package
156       and/or documentation.
157
158       When proposing code changes, please provide unified diffs, i.e the out‐
159       put of diff -u.
160
161       Note further that  attachments  are  strongly  preferred  over  inlined
162       patches.  Attachments  can  be  made  by  going to the Edit form of the
163       ticket immediately after its creation, and  then  using  the  left-most
164       button in the secondary navigation bar.
165

KEYWORDS

167       dom, parser, xml
168

CATEGORY

170       Text processing
171
173       2006 Darren New. All Rights Reserved.
174
175
176
177
178tcllib                                1.0                              xsxp(n)
Impressum