1pt::ast(n)                       Parser Tools                       pt::ast(n)
2
3
4
5______________________________________________________________________________
6

NAME

8       pt::ast - Abstract Syntax Tree Serialization
9

SYNOPSIS

11       package require Tcl  8.5
12
13       package require pt::ast  ?1.1?
14
15       ::pt::ast verify serial ?canonvar?
16
17       ::pt::ast verify-as-canonical serial
18
19       ::pt::ast canonicalize serial
20
21       ::pt::ast print serial
22
23       ::pt::ast bottomup cmdprefix ast
24
25       cmdprefix ast
26
27       ::pt::ast topdown cmdprefix pe
28
29       ::pt::ast equal seriala serialb
30
31       ::pt::ast new0 s loc ?child...?
32
33       ::pt::ast new s start end ?child...?
34
35______________________________________________________________________________
36

DESCRIPTION

38       Are  you  lost ?  Do you have trouble understanding this document ?  In
39       that case please read the overview  provided  by  the  Introduction  to
40       Parser  Tools.  This document is the entrypoint to the whole system the
41       current package is a part of.
42
43       This package provides commands to work with the serializations  of  ab‐
44       stract  syntax  trees  as managed by the Parser Tools, and specified in
45       section AST serialization format.
46
47       This is a supporting package in the Core Layer of Parser Tools.
48
49       IMAGE: arch_core_support
50

API

52       ::pt::ast verify serial ?canonvar?
53              This command verifies that the content of serial is a valid  se‐
54              rialization  of  an abstract syntax tree and will throw an error
55              if that is not the case. The result of the command is the  empty
56              string.
57
58              If  the  argument canonvar is specified it is interpreted as the
59              name of a variable in the calling context. This variable will be
60              written  to  if and only if serial is a valid regular serializa‐
61              tion. Its value will be a boolean, with True indicating that the
62              serialization  is not only valid, but also canonical. False will
63              be written for a valid, but non-canonical serialization.
64
65              For the specification of serializations see the section AST  se‐
66              rialization format.
67
68       ::pt::ast verify-as-canonical serial
69              This  command  verifies  that  the  content of serial is a valid
70              canonical serialization of an  abstract  syntax  tree  and  will
71              throw  an  error if that is not the case. The result of the com‐
72              mand is the empty string.
73
74              For the specification of canonical serializations see  the  sec‐
75              tion AST serialization format.
76
77       ::pt::ast canonicalize serial
78              This command assumes that the content of serial is a valid regu‐
79              lar serialization of an abstract syntax and will throw an  error
80              if that is not the case.
81
82              It  will then convert the input into the canonical serialization
83              of the contained tree and return it as its result. If the  input
84              is already canonical it will be returned unchanged.
85
86              For  the  specification  of regular and canonical serializations
87              see the section AST serialization format.
88
89       ::pt::ast print serial
90              This command assumes that the argument serial contains  a  valid
91              serialization  of  an  abstract syntax tree and returns a string
92              containing that tree in a human readable form.
93
94              The exact format of this form is not specified and cannot be re‐
95              lied on for parsing or other machine-based activities.
96
97              For  the specification of serializations see the section AST se‐
98              rialization format.
99
100       ::pt::ast bottomup cmdprefix ast
101              This command walks the abstract syntax tree ast from the  bottom
102              up  to  the root, invoking the command prefix cmdprefix for each
103              node. This implies that the children of a node N are handled be‐
104              fore N.
105
106              The command prefix has the signature
107
108              cmdprefix ast
109                     I.e.  it  is  invoked  with the ast node the walk is cur‐
110                     rently at.
111
112                     The result returned by the command prefix replaces ast in
113                     the  node  it was a child of, allowing transformations of
114                     the tree.
115
116                     This also means that for all inner node the  contents  of
117                     the children elements are the results of the command pre‐
118                     fix invoked for the children of this node.
119
120       ::pt::ast topdown cmdprefix pe
121              This command walks the abstract syntax tree ast  from  the  root
122              down  to  the  leaves, invoking the command prefix cmdprefix for
123              each node. This implies that the children of a node N  are  han‐
124              dled after N.
125
126              The  command  prefix has the same signature as for bottomup, see
127              above.
128
129              The result returned by the command prefix is ignored.
130
131       ::pt::ast equal seriala serialb
132              This command tests the two sbstract syntax trees seriala and se‐
133              rialb  for  structural  equality. The result of the command is a
134              boolean value. It will be set to true if the trees  are  identi‐
135              cal, and false otherwise.
136
137              String  equality  is  usable  only if we can assume that the two
138              trees are pure Tcl lists.
139
140       ::pt::ast new0 s loc ?child...?
141              This command command constructs the ast for a  nonterminal  node
142              refering  refering to the symbol s at position loc in the input,
143              and the set of child nodes child ..., from left right. The  lat‐
144              ter may be empty. The constructed node is returned as the result
145              of the command. The end position is loc-1,  i.e.  one  character
146              before  the  start. This type of node is possible for rules con‐
147              taining optional parts.
148
149       ::pt::ast new s start end ?child...?
150              This command command constructs the ast for a  nonterminal  node
151              refering  to  the symbol s covering the range of positions start
152              to end in the input, and the set of child nodes child ...,  from
153              left right. The latter may be empty. The constructed node is re‐
154              turned as the result of the command.
155

AST SERIALIZATION FORMAT

157       Here we specify the format used by the Parser Tools  to  serialize  Ab‐
158       stract  Syntax Trees (ASTs) as immutable values for transport, compari‐
159       son, etc.
160
161       Each node in an AST represents a nonterminal symbol of a  grammar,  and
162       the  range of tokens/characters in the input covered by it. ASTs do not
163       contain terminal symbols, i.e. tokens/characters. These can  be  recov‐
164       ered from the input given a symbol's location.
165
166       We  distinguish  between regular and canonical serializations.  While a
167       tree may have more than one regular serialization only exactly  one  of
168       them will be canonical.
169
170       Regular serialization
171
172              [1]    The  serialization of any AST is the serialization of its
173                     root node.
174
175              [2]    The serialization of any node is a Tcl list containing at
176                     least three elements.
177
178                     [1]    The  first  element is the name of the nonterminal
179                            symbol stored in the node.
180
181                     [2]    The second and third element are the locations  of
182                            the  first  and last token in the token stream the
183                            node represents (covers).
184
185                            [1]    Locations are provided as non-negative  in‐
186                                   teger offsets from the beginning of the to‐
187                                   ken stream, with the first token  found  in
188                                   the stream located at offset 0 (zero).
189
190                            [2]    The  end  location  has  to  be equal to or
191                                   larger than the start location.
192
193                     [3]    All elements after the first three  represent  the
194                            children  of the node, which are themselves nodes.
195                            This means that the serializations of nodes  with‐
196                            out  children, i.e. leaf nodes, have exactly three
197                            elements.  The children are  stored  in  the  list
198                            with  the  leftmost child first, and the rightmost
199                            child last.
200
201       Canonical serialization
202              The canonical serialization of an abstract syntax tree  has  the
203              format  as specified in the previous item, and then additionally
204              satisfies the constraints below, which make it unique among  all
205              the possible serializations of this tree.
206
207              [1]    The  string  representation of the value is the canonical
208                     representation of a pure Tcl list. I.e. it does not  con‐
209                     tain superfluous whitespace.
210
211   EXAMPLE
212       Assuming the parsing expression grammar below
213
214              PEG calculator (Expression)
215                  Digit      <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9'       ;
216                  Sign       <- '-' / '+'                                     ;
217                  Number     <- Sign? Digit+                                  ;
218                  Expression <- Term (AddOp Term)*                            ;
219                  MulOp      <- '*' / '/'                                     ;
220                  Term       <- Factor (MulOp Factor)*                        ;
221                  AddOp      <- '+'/'-'                                       ;
222                  Factor     <- '(' Expression ')' / Number                   ;
223              END;
224
225
226       and the input string
227
228               120+5
229       then a parser should deliver the abstract syntax tree below (except for
230       whitespace)
231
232              set ast {Expression 0 4
233                  {Factor 0 4
234                      {Term 0 2
235                          {Number 0 2
236                              {Digit 0 0}
237                              {Digit 1 1}
238                              {Digit 2 2}
239                          }
240                      }
241                      {AddOp 3 3}
242                      {Term 4 4
243                          {Number 4 4
244                              {Digit 4 4}
245                          }
246                      }
247                  }
248              }
249
250
251       Or, more graphical
252
253       .nf +- Digit 0 0 | 1 |            | +- Term 0 2  ---  Number  0  2  -+-
254       Digit   1   1   |   2   |                            |             |  |
255       +- Digit 2 2 | 0 |                                        |  Expression
256       0  4  ---  Factor  0  4 -+----------------------------- AddOp 3 3 | + |
257       | +- Term 4 4 --- Number 4 4 --- Digit 4 4 | 5 .fi
258

BUGS, IDEAS, FEEDBACK

260       This document, and the package it describes, will  undoubtedly  contain
261       bugs  and other problems.  Please report such in the category pt of the
262       Tcllib Trackers  [http://core.tcl.tk/tcllib/reportlist].   Please  also
263       report  any  ideas  for  enhancements  you  may have for either package
264       and/or documentation.
265
266       When proposing code changes, please provide unified diffs, i.e the out‐
267       put of diff -u.
268
269       Note  further  that  attachments  are  strongly  preferred over inlined
270       patches. Attachments can be made by going  to  the  Edit  form  of  the
271       ticket  immediately  after  its  creation, and then using the left-most
272       button in the secondary navigation bar.
273

KEYWORDS

275       EBNF, LL(k), PEG, TDPL, context-free  languages,  expression,  grammar,
276       matching,  parser, parsing expression, parsing expression grammar, push
277       down automaton, recursive descent, state, top-down  parsing  languages,
278       transducer
279

CATEGORY

281       Parsing and Grammars
282
284       Copyright (c) 2009 Andreas Kupries <andreas_kupries@users.sourceforge.net>
285
286
287
288
289tcllib                                1.1                           pt::ast(n)
Impressum