1xsxp(n) Amazon S3 Web Service Utilities xsxp(n)
2
3
4
5______________________________________________________________________________
6
8 xsxp - eXtremely Simple Xml Parser
9
11 package require Tcl 8.4
12
13 package require xsxp 1
14
15 package require xml
16
17 xsxp::parse xml
18
19 xsxp::fetch pxml path ?part?
20
21 xsxp::fetchall pxml_list path ?part?
22
23 xsxp::only pxml tagname
24
25 xsxp::prettyprint pxml ?chan?
26
27______________________________________________________________________________
28
30 This package provides a simple interface to parse XML into a pure-value
31 list. It also provides accessor routines to pull out specific subtags,
32 not unlike DOM access. This package was written for and is used by
33 Darren New's Amazon S3 access package.
34
35 This is pretty lame, but I needed something like this for S3, and at
36 the time, TclDOM would not work with the new 8.5 Tcl due to version
37 number problems.
38
39 In addition, this is a pure-value implementation. There is no garbage
40 to clean up in the event of a thrown error, for example. This simpli‐
41 fies the code for sufficiently small XML documents, which is what Ama‐
42 zon's S3 guarantees.
43
44 Copyright 2006 Darren New. All Rights Reserved. NO WARRANTIES OF ANY
45 TYPE ARE PROVIDED. COPYING OR USE INDEMNIFIES THE AUTHOR IN ALL WAYS.
46 This software is licensed under essentially the same terms as Tcl. See
47 LICENSE.txt for the terms.
48
50 The package implements five rather simple procedures. One parses, one
51 is for debugging, and the rest pull various parts of the parsed docu‐
52 ment out for processing.
53
54 xsxp::parse xml
55 This parses an XML document (using the standard xml tcllib mod‐
56 ule in a SAX sort of way) and builds a data structure which it
57 returns if the parsing succeeded. The return value is referred
58 to herein as a "pxml", or "parsed xml". The list consists of two
59 or more elements:
60
61 • The first element is the name of the tag.
62
63 • The second element is an array-get formatted list of
64 key/value pairs. The keys are attribute names and the
65 values are attribute values. This is an empty list if
66 there are no attributes on the tag.
67
68 • The third through end elements are the children of the
69 node, if any. Each child is, recursively, a pxml.
70
71 • Note that if the zero'th element, i.e. the tag name, is
72 "%PCDATA", then the attributes will be empty and the
73 third element will be the text of the element. In addi‐
74 tion, if an element's contents consists only of PCDATA,
75 it will have only one child, and all the PCDATA will be
76 concatenated. In other words, this parser works poorly
77 for XML with elements that contain both child tags and
78 PCDATA. Since Amazon S3 does not do this (and for that
79 matter most uses of XML where XML is a poor choice don't
80 do this), this is probably not a serious limitation.
81
82
83 xsxp::fetch pxml path ?part?
84 pxml is a parsed XML, as returned from xsxp::parse. path is a
85 list of element tag names. Each element is the name of a child
86 to look up, optionally followed by a hash ("#") and a string of
87 digits. An empty list or an initial empty element selects pxml.
88 If no hash sign is present, the behavior is as if "#0" had been
89 appended to that element. (In addition to a list, slashes can
90 separate subparts where convenient.)
91
92 An element of path scans the children at the indicated level for
93 the n'th instance of a child whose tag matches the part of the
94 element before the hash sign. If an element is simply "#" fol‐
95 lowed by digits, that indexed child is selected, regardless of
96 the tags in the children. Hence, an element of "#3" will always
97 select the fourth child of the node under consideration.
98
99 part defaults to "%ALL". It can be one of the following case-
100 sensitive terms:
101
102 %ALL returns the entire selected element.
103
104 %TAGNAME
105 returns lindex 0 of the selected element.
106
107 %ATTRIBUTES
108 returns index 1 of the selected element.
109
110 %CHILDREN
111 returns lrange 2 through end of the selected element, re‐
112 sulting in a list of elements being returned.
113
114 %PCDATA
115 returns a concatenation of all the bodies of direct chil‐
116 dren of this node whose tag is %PCDATA. It throws an er‐
117 ror if no such children are found. That is, part=%PCDATA
118 means return the textual content found in that node but
119 not its children nodes.
120
121 %PCDATA?
122 is like %PCDATA, but returns an empty string if no PCDATA
123 is found.
124
125 For example, to fetch the first bold text from the fifth paragraph of
126 the body of your HTML file,
127
128 xsxp::fetch $pxml {body p#4 b} %PCDATA
129
130
131 xsxp::fetchall pxml_list path ?part?
132 This iterates over each PXML in pxml_list (which must be a list
133 of pxmls) selecting the indicated path from it, building a new
134 list with the selected data, and returning that new list.
135
136 For example, pxml_list might be the %CHILDREN of a particular
137 element, and the path and part might select from each child a
138 sub-element in which we're interested.
139
140
141 xsxp::only pxml tagname
142 This iterates over the direct children of pxml and selects only
143 those with tagname as their tag. Returns a list of matching ele‐
144 ments.
145
146
147 xsxp::prettyprint pxml ?chan?
148 This outputs to chan (default stdout) a pretty-printed version
149 of pxml.
150
152 This document, and the package it describes, will undoubtedly contain
153 bugs and other problems. Please report such in the category amazon-s3
154 of the Tcllib Trackers [http://core.tcl.tk/tcllib/reportlist]. Please
155 also report any ideas for enhancements you may have for either package
156 and/or documentation.
157
158 When proposing code changes, please provide unified diffs, i.e the out‐
159 put of diff -u.
160
161 Note further that attachments are strongly preferred over inlined
162 patches. Attachments can be made by going to the Edit form of the
163 ticket immediately after its creation, and then using the left-most
164 button in the secondary navigation bar.
165
167 dom, parser, xml
168
170 Text processing
171
173 2006 Darren New. All Rights Reserved.
174
175
176
177
178tcllib 1.0 xsxp(n)