1xsxp(n) eXtremely Simple Xml Parser xsxp(n)
2
3
4
5______________________________________________________________________________
6
8 xsxp - eXtremely Simple Xml Parser
9
11 package require Tcl 8.4
12
13 package require xml
14
15 xsxp::parse xml
16
17 xsxp::fetch pxml path ?part?
18
19 xsxp::fetchall pxml_list path ?part?
20
21 xsxp::only pxml tagname
22
23 xsxp::prettyprint pxml ?chan?
24
25_________________________________________________________________
26
28 This package provides a simple interface to parse XML into a pure-value
29 list. It also provides accessor routines to pull out specific subtags,
30 not unlike DOM access. This package was written for and is used by
31 Darren New's Amazon S3 access package.
32
33 This is pretty lame, but I needed something like this for S3, and at
34 the time, TclDOM would not work with the new 8.5 Tcl due to version
35 number problems.
36
37 In addition, this is a pure-value implementation. There is no garbage
38 to clean up in the event of a thrown error, for example. This simpli‐
39 fies the code for sufficiently small XML documents, which is what Ama‐
40 zon's S3 guarantees.
41
42 Copyright 2006 Darren New. All Rights Reserved. NO WARRANTIES OF ANY
43 TYPE ARE PROVIDED. COPYING OR USE INDEMNIFIES THE AUTHOR IN ALL WAYS.
44 This software is licensed under essentially the same terms as Tcl. See
45 LICENSE.txt for the terms.
46
48 The package implements five rather simple procedures. One parses, one
49 is for debugging, and the rest pull various parts of the parsed docu‐
50 ment out for processing.
51
52 xsxp::parse xml
53 This parses an XML document (using the standard xml tcllib mod‐
54 ule in a SAX sort of way) and builds a data structure which it
55 returns if the parsing succeeded. The return value is referred
56 to herein as a "pxml", or "parsed xml". The list consists of two
57 or more elements:
58
59 · The first element is the name of the tag.
60
61 · The second element is an array-get formatted list of
62 key/value pairs. The keys are attribute names and the
63 values are attribute values. This is an empty list if
64 there are no attributes on the tag.
65
66 · The third through end elements are the children of the
67 node, if any. Each child is, recursively, a pxml.
68
69 · Note that if the zero'th element, i.e. the tag name, is
70 "%PCDATA", then the attributes will be empty and the
71 third element will be the text of the element. In addi‐
72 tion, if an element's contents consists only of PCDATA,
73 it will have only one child, and all the PCDATA will be
74 concatenated. In other words, this parser works poorly
75 for XML with elements that contain both child tags and
76 PCDATA. Since Amazon S3 does not do this (and for that
77 matter most uses of XML where XML is a poor choice don't
78 do this), this is probably not a serious limitation.
79
80
81 xsxp::fetch pxml path ?part?
82 pxml is a parsed XML, as returned from xsxp::parse. path is a
83 list of element tag names. Each element is the name of a child
84 to look up, optionally followed by a hash ("#") and a string of
85 digits. An empty list or an initial empty element selects pxml.
86 If no hash sign is present, the behavior is as if "#0" had been
87 appended to that element. (In addition to a list, slashes can
88 separate subparts where convenient.)
89
90 An element of path scans the children at the indicated level for
91 the n'th instance of a child whose tag matches the part of the
92 element before the hash sign. If an element is simply "#" fol‐
93 lowed by digits, that indexed child is selected, regardless of
94 the tags in the children. Hence, an element of "#3" will always
95 select the fourth child of the node under consideration.
96
97 part defaults to "%ALL". It can be one of the following case-
98 sensitive terms:
99
100 %ALL returns the entire selected element.
101
102 %TAGNAME
103 returns lindex 0 of the selected element.
104
105 %ATTRIBUTES
106 returns index 1 of the selected element.
107
108 %CHILDREN
109 returns lrange 2 through end of the selected element,
110 resulting in a list of elements being returned.
111
112 %PCDATA
113 returns a concatenation of all the bodies of direct chil‐
114 dren of this node whose tag is %PCDATA. It throws an
115 error if no such children are found. That is,
116 part=%PCDATA means return the textual content found in
117 that node but not its children nodes.
118
119 %PCDATA?
120 is like %PCDATA, but returns an empty string if no PCDATA
121 is found.
122
123 For example, to fetch the first bold text from the fifth paragraph of
124 the body of your HTML file,
125 xsxp::fetch $pxml {html body p#4 b} %PCDATA
126
127
128 xsxp::fetchall pxml_list path ?part?
129 This iterates over each PXML in pxml_list (which must be a list
130 of pxmls) selecting the indicated path from it, building a new
131 list with the selected data, and returning that new list.
132
133 For example, pxml_list might be the %CHILDREN of a particular
134 element, and the path and part might select from each child a
135 sub-element in which we're interested.
136
137
138 xsxp::only pxml tagname
139 This iterates over the direct children of pxml and selects only
140 those with tagname as their tag. Returns a list of matching ele‐
141 ments.
142
143
144 xsxp::prettyprint pxml ?chan?
145 This outputs to chan (default stdout) a pretty-printed version
146 of pxml.
147
149 This document, and the package it describes, will undoubtedly contain
150 bugs and other problems. Please report such in the category amazon-s3
151 of the Tcllib SF Trackers [http://source‐
152 forge.net/tracker/?group_id=12883]. Please also report any ideas for
153 enhancements you may have for either package and/or documentation.
154
156 Copyright (c) Copyright 2006 Darren New. All Rights Reserved.
157
158
159
160
161amazon-s3 1.0 xsxp(n)