1pt::ast(n) Parser Tools pt::ast(n)
2
3
4
5______________________________________________________________________________
6
8 pt::ast - Abstract Syntax Tree Serialization
9
11 package require Tcl 8.5
12
13 package require pt::ast ?1.1?
14
15 ::pt::ast verify serial ?canonvar?
16
17 ::pt::ast verify-as-canonical serial
18
19 ::pt::ast canonicalize serial
20
21 ::pt::ast print serial
22
23 ::pt::ast bottomup cmdprefix ast
24
25 cmdprefix ast
26
27 ::pt::ast topdown cmdprefix pe
28
29 ::pt::ast equal seriala serialb
30
31 ::pt::ast new0 s loc ?child...?
32
33 ::pt::ast new s start end ?child...?
34
35______________________________________________________________________________
36
38 Are you lost ? Do you have trouble understanding this document ? In
39 that case please read the overview provided by the Introduction to
40 Parser Tools. This document is the entrypoint to the whole system the
41 current package is a part of.
42
43 This package provides commands to work with the serializations of
44 abstract syntax trees as managed by the Parser Tools, and specified in
45 section AST serialization format.
46
47 This is a supporting package in the Core Layer of Parser Tools.
48
49 IMAGE: arch_core_support
50
52 ::pt::ast verify serial ?canonvar?
53 This command verifies that the content of serial is a valid
54 serialization of an abstract syntax tree and will throw an error
55 if that is not the case. The result of the command is the empty
56 string.
57
58 If the argument canonvar is specified it is interpreted as the
59 name of a variable in the calling context. This variable will be
60 written to if and only if serial is a valid regular serializa‐
61 tion. Its value will be a boolean, with True indicating that the
62 serialization is not only valid, but also canonical. False will
63 be written for a valid, but non-canonical serialization.
64
65 For the specification of serializations see the section AST
66 serialization format.
67
68 ::pt::ast verify-as-canonical serial
69 This command verifies that the content of serial is a valid
70 canonical serialization of an abstract syntax tree and will
71 throw an error if that is not the case. The result of the com‐
72 mand is the empty string.
73
74 For the specification of canonical serializations see the sec‐
75 tion AST serialization format.
76
77 ::pt::ast canonicalize serial
78 This command assumes that the content of serial is a valid regu‐
79 lar serialization of an abstract syntax and will throw an error
80 if that is not the case.
81
82 It will then convert the input into the canonical serialization
83 of the contained tree and return it as its result. If the input
84 is already canonical it will be returned unchanged.
85
86 For the specification of regular and canonical serializations
87 see the section AST serialization format.
88
89 ::pt::ast print serial
90 This command assumes that the argument serial contains a valid
91 serialization of an abstract syntax tree and returns a string
92 containing that tree in a human readable form.
93
94 The exact format of this form is not specified and cannot be
95 relied on for parsing or other machine-based activities.
96
97 For the specification of serializations see the section AST
98 serialization format.
99
100 ::pt::ast bottomup cmdprefix ast
101 This command walks the abstract syntax tree ast from the bottom
102 up to the root, invoking the command prefix cmdprefix for each
103 node. This implies that the children of a node N are handled
104 before N.
105
106 The command prefix has the signature
107
108 cmdprefix ast
109 I.e. it is invoked with the ast node the walk is cur‐
110 rently at.
111
112 The result returned by the command prefix replaces ast in
113 the node it was a child of, allowing transformations of
114 the tree.
115
116 This also means that for all inner node the contents of
117 the children elements are the results of the command pre‐
118 fix invoked for the children of this node.
119
120 ::pt::ast topdown cmdprefix pe
121 This command walks the abstract syntax tree ast from the root
122 down to the leaves, invoking the command prefix cmdprefix for
123 each node. This implies that the children of a node N are han‐
124 dled after N.
125
126 The command prefix has the same signature as for bottomup, see
127 above.
128
129 The result returned by the command prefix is ignored.
130
131 ::pt::ast equal seriala serialb
132 This command tests the two sbstract syntax trees seriala and
133 serialb for structural equality. The result of the command is a
134 boolean value. It will be set to true if the trees are identi‐
135 cal, and false otherwise.
136
137 String equality is usable only if we can assume that the two
138 trees are pure Tcl lists.
139
140 ::pt::ast new0 s loc ?child...?
141 This command command constructs the ast for a nonterminal node
142 refering refering to the symbol s at position loc in the input,
143 and the set of child nodes child ..., from left right. The lat‐
144 ter may be empty. The constructed node is returned as the result
145 of the command. The end position is loc-1, i.e. one character
146 before the start. This type of node is possible for rules con‐
147 taining optional parts.
148
149 ::pt::ast new s start end ?child...?
150 This command command constructs the ast for a nonterminal node
151 refering to the symbol s covering the range of positions start
152 to end in the input, and the set of child nodes child ..., from
153 left right. The latter may be empty. The constructed node is
154 returned as the result of the command.
155
157 Here we specify the format used by the Parser Tools to serialize
158 Abstract Syntax Trees (ASTs) as immutable values for transport, compar‐
159 ison, etc.
160
161 Each node in an AST represents a nonterminal symbol of a grammar, and
162 the range of tokens/characters in the input covered by it. ASTs do not
163 contain terminal symbols, i.e. tokens/characters. These can be recov‐
164 ered from the input given a symbol's location.
165
166 We distinguish between regular and canonical serializations. While a
167 tree may have more than one regular serialization only exactly one of
168 them will be canonical.
169
170 Regular serialization
171
172 [1] The serialization of any AST is the serialization of its
173 root node.
174
175 [2] The serialization of any node is a Tcl list containing at
176 least three elements.
177
178 [1] The first element is the name of the nonterminal
179 symbol stored in the node.
180
181 [2] The second and third element are the locations of
182 the first and last token in the token stream the
183 node represents (covers).
184
185 [1] Locations are provided as non-negative
186 integer offsets from the beginning of the
187 token stream, with the first token found in
188 the stream located at offset 0 (zero).
189
190 [2] The end location has to be equal to or
191 larger than the start location.
192
193 [3] All elements after the first three represent the
194 children of the node, which are themselves nodes.
195 This means that the serializations of nodes with‐
196 out children, i.e. leaf nodes, have exactly three
197 elements. The children are stored in the list
198 with the leftmost child first, and the rightmost
199 child last.
200
201 Canonical serialization
202 The canonical serialization of an abstract syntax tree has the
203 format as specified in the previous item, and then additionally
204 satisfies the constraints below, which make it unique among all
205 the possible serializations of this tree.
206
207 [1] The string representation of the value is the canonical
208 representation of a pure Tcl list. I.e. it does not con‐
209 tain superfluous whitespace.
210
211 EXAMPLE
212 Assuming the parsing expression grammar below
213
214 PEG calculator (Expression)
215 Digit <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9' ;
216 Sign <- '-' / '+' ;
217 Number <- Sign? Digit+ ;
218 Expression <- Term (AddOp Term)* ;
219 MulOp <- '*' / '/' ;
220 Term <- Factor (MulOp Factor)* ;
221 AddOp <- '+'/'-' ;
222 Factor <- '(' Expression ')' / Number ;
223 END;
224
225
226 and the input string
227
228 120+5
229 then a parser should deliver the abstract syntax tree below (except for
230 whitespace)
231
232 set ast {Expression 0 4
233 {Factor 0 4
234 {Term 0 2
235 {Number 0 2
236 {Digit 0 0}
237 {Digit 1 1}
238 {Digit 2 2}
239 }
240 }
241 {AddOp 3 3}
242 {Term 4 4
243 {Number 4 4
244 {Digit 4 4}
245 }
246 }
247 }
248 }
249
250
251 Or, more graphical
252
253 .nf +- Digit 0 0 | 1 | | +- Term 0 2 --- Number 0 2 -+-
254 Digit 1 1 | 2 | | | |
255 +- Digit 2 2 | 0 | | Expression
256 0 4 --- Factor 0 4 -+----------------------------- AddOp 3 3 | + |
257 | +- Term 4 4 --- Number 4 4 --- Digit 4 4 | 5 .fi
258
260 This document, and the package it describes, will undoubtedly contain
261 bugs and other problems. Please report such in the category pt of the
262 Tcllib Trackers [http://core.tcl.tk/tcllib/reportlist]. Please also
263 report any ideas for enhancements you may have for either package
264 and/or documentation.
265
266 When proposing code changes, please provide unified diffs, i.e the out‐
267 put of diff -u.
268
269 Note further that attachments are strongly preferred over inlined
270 patches. Attachments can be made by going to the Edit form of the
271 ticket immediately after its creation, and then using the left-most
272 button in the secondary navigation bar.
273
275 EBNF, LL(k), PEG, TDPL, context-free languages, expression, grammar,
276 matching, parser, parsing expression, parsing expression grammar, push
277 down automaton, recursive descent, state, top-down parsing languages,
278 transducer
279
281 Parsing and Grammars
282
284 Copyright (c) 2009 Andreas Kupries <andreas_kupries@users.sourceforge.net>
285
286
287
288
289tcllib 1.1 pt::ast(n)