1xsxp(n)                   eXtremely Simple Xml Parser                  xsxp(n)
2
3
4
5______________________________________________________________________________
6

NAME

8       xsxp - eXtremely Simple Xml Parser
9

SYNOPSIS

11       package require Tcl  8.4
12
13       package require xml
14
15       xsxp::parse xml
16
17       xsxp::fetch pxml path ?part?
18
19       xsxp::fetchall pxml_list path ?part?
20
21       xsxp::only pxml tagname
22
23       xsxp::prettyprint pxml ?chan?
24
25_________________________________________________________________
26

DESCRIPTION

28       This package provides a simple interface to parse XML into a pure-value
29       list.  It also provides accessor routines to pull out specific subtags,
30       not  unlike  DOM  access.   This package was written for and is used by
31       Darren New's Amazon S3 access package.
32
33       This is pretty lame, but I needed something like this for  S3,  and  at
34       the  time,  TclDOM  would  not work with the new 8.5 Tcl due to version
35       number problems.
36
37       In addition, this is a pure-value implementation. There is  no  garbage
38       to  clean up in the event of a thrown error, for example.  This simpli‐
39       fies the code for sufficiently small XML documents, which is what  Ama‐
40       zon's S3 guarantees.
41
42       Copyright  2006  Darren New. All Rights Reserved.  NO WARRANTIES OF ANY
43       TYPE ARE PROVIDED.  COPYING OR USE INDEMNIFIES THE AUTHOR IN ALL  WAYS.
44       This  software is licensed under essentially the same terms as Tcl. See
45       LICENSE.txt for the terms.
46

COMMANDS

48       The package implements five rather simple procedures.  One parses,  one
49       is  for  debugging, and the rest pull various parts of the parsed docu‐
50       ment out for processing.
51
52       xsxp::parse xml
53              This parses an XML document (using the standard xml tcllib  mod‐
54              ule  in  a SAX sort of way) and builds a data structure which it
55              returns if the parsing succeeded. The return value  is  referred
56              to herein as a "pxml", or "parsed xml". The list consists of two
57              or more elements:
58
59              ·      The first element is the name of the tag.
60
61              ·      The second element is  an  array-get  formatted  list  of
62                     key/value  pairs.  The  keys  are attribute names and the
63                     values are attribute values. This is  an  empty  list  if
64                     there are no attributes on the tag.
65
66              ·      The  third  through  end elements are the children of the
67                     node, if any. Each child is, recursively, a pxml.
68
69              ·      Note that if the zero'th element, i.e. the tag  name,  is
70                     "%PCDATA",  then  the  attributes  will  be empty and the
71                     third element will be the text of the element.  In  addi‐
72                     tion,  if  an element's contents consists only of PCDATA,
73                     it will have only one child, and all the PCDATA  will  be
74                     concatenated.  In  other  words, this parser works poorly
75                     for XML with elements that contain both  child  tags  and
76                     PCDATA.   Since  Amazon S3 does not do this (and for that
77                     matter most uses of XML where XML is a poor choice  don't
78                     do this), this is probably not a serious limitation.
79
80
81       xsxp::fetch pxml path ?part?
82              pxml  is  a parsed XML, as returned from xsxp::parse.  path is a
83              list of element tag names. Each element is the name of  a  child
84              to  look up, optionally followed by a hash ("#") and a string of
85              digits. An empty list or an initial empty element selects  pxml.
86              If  no hash sign is present, the behavior is as if "#0" had been
87              appended to that element. (In addition to a  list,  slashes  can
88              separate subparts where convenient.)
89
90              An element of path scans the children at the indicated level for
91              the n'th instance of a child whose tag matches the part  of  the
92              element  before the hash sign. If an element is simply "#"  fol‐
93              lowed by digits, that indexed child is selected,  regardless  of
94              the  tags in the children. Hence, an element of "#3" will always
95              select the fourth child of the node under consideration.
96
97              part defaults to "%ALL". It can be one of  the  following  case-
98              sensitive terms:
99
100              %ALL   returns the entire selected element.
101
102              %TAGNAME
103                     returns lindex 0 of the selected element.
104
105              %ATTRIBUTES
106                     returns index 1 of the selected element.
107
108              %CHILDREN
109                     returns  lrange  2  through  end of the selected element,
110                     resulting in a list of elements being returned.
111
112              %PCDATA
113                     returns a concatenation of all the bodies of direct chil‐
114                     dren  of  this  node  whose tag is %PCDATA.  It throws an
115                     error  if  no  such  children   are   found.   That   is,
116                     part=%PCDATA  means  return  the textual content found in
117                     that node but not its children nodes.
118
119              %PCDATA?
120                     is like %PCDATA, but returns an empty string if no PCDATA
121                     is found.
122
123       For  example,  to fetch the first bold text from the fifth paragraph of
124       the body of your HTML file,
125       xsxp::fetch $pxml {html body p#4 b} %PCDATA
126
127
128       xsxp::fetchall pxml_list path ?part?
129              This iterates over each PXML in pxml_list (which must be a  list
130              of  pxmls)  selecting the indicated path from it, building a new
131              list with the selected data, and returning that new list.
132
133              For example, pxml_list might be the %CHILDREN  of  a  particular
134              element,  and  the  path and part might select from each child a
135              sub-element in which we're interested.
136
137
138       xsxp::only pxml tagname
139              This iterates over the direct children of pxml and selects  only
140              those with tagname as their tag. Returns a list of matching ele‐
141              ments.
142
143
144       xsxp::prettyprint pxml ?chan?
145              This outputs to chan (default stdout) a  pretty-printed  version
146              of pxml.
147

BUGS, IDEAS, FEEDBACK

149       This  document,  and the package it describes, will undoubtedly contain
150       bugs and other problems.  Please report such in the category  amazon-s3
151       of       the       Tcllib       SF       Trackers       [http://source
152       forge.net/tracker/?group_id=12883].  Please also report any  ideas  for
153       enhancements you may have for either package and/or documentation.
154
156       Copyright (c) Copyright 2006 Darren New. All Rights Reserved.
157
158
159
160
161amazon-s3                             1.0                              xsxp(n)
Impressum