1pt::pe::op(n) Parser Tools pt::pe::op(n)
2
3
4
5______________________________________________________________________________
6
8 pt::pe::op - Parsing Expression Utilities
9
11 package require Tcl 8.5
12
13 package require pt::pe::op ?1.0.1?
14
15 package require pt::pe ?1?
16
17 package require struct::set
18
19 ::pt::pe::op drop dropset pe
20
21 ::pt::pe::op rename nt ntnew pe
22
23 ::pt::pe::op called pe
24
25 ::pt::pe::op flatten pe
26
27 ::pt::pe::op fusechars pe
28
29______________________________________________________________________________
30
32 Are you lost ? Do you have trouble understanding this document ? In
33 that case please read the overview provided by the Introduction to
34 Parser Tools. This document is the entrypoint to the whole system the
35 current package is a part of.
36
37 This package provides additional commands to work with the serializa‐
38 tions of parsing expressions as managed by the PEG and related pack‐
39 ages, and specified in section PE serialization format.
40
41 This is an internal package, for use by the higher level packages han‐
42 dling PEGs, their conversion into and out of various other formats, or
43 other uses.
44
46 ::pt::pe::op drop dropset pe
47 This command removes all occurences of any of the nonterminals
48 symbols in the set dropset from the parsing expression pe, and
49 simplifies it. This may result in the expression becoming
50 "epsilon", i.e. matching nothing.
51
52 ::pt::pe::op rename nt ntnew pe
53 This command renames all occurences of the nonterminal nt in the
54 parsing expression pe into ntnew.
55
56 ::pt::pe::op called pe
57 This command extracts the set of all nonterminal symbols used,
58 i.e. 'called', in the parsing expression pe.
59
60 ::pt::pe::op flatten pe
61 This command transforms the parsing expression by eliminating
62 sequences nested in sequences, and choices in choices, lifting
63 the children of the nested expression into the parent. It fur‐
64 ther eliminates all sequences and choices with only one child,
65 as these are redundant.
66
67 The resulting parsing expression is returned as the result of
68 the command.
69
70 ::pt::pe::op fusechars pe
71 This command transforms the parsing expression by fusing adja‐
72 cent terminals in sequences and adjacent terminals and ranges in
73 choices, it (re)constructs highlevel strings and character
74 classes.
75
76 The resulting pseudo-parsing expression is returned as the
77 result of the command and may contain the pseudo-operators str
78 for character sequences, aka strings, and cl for character
79 choices, aka character classes.
80
81 The result is called a pseudo-parsing expression because it is
82 not a true parsing expression anymore, and will fail a check
83 with ::pt::peg verify if the new pseudo-operators are present in
84 the result, but is otherwise of sound structure for a parsing
85 expression. Notably, the commands ::pt::peg bottomup and
86 ::pt::peg topdown will process them without trouble.
87
89 Here we specify the format used by the Parser Tools to serialize Pars‐
90 ing Expressions as immutable values for transport, comparison, etc.
91
92 We distinguish between regular and canonical serializations. While a
93 parsing expression may have more than one regular serialization only
94 exactly one of them will be canonical.
95
96 Regular serialization
97
98 Atomic Parsing Expressions
99
100 [1] The string epsilon is an atomic parsing expres‐
101 sion. It matches the empty string.
102
103 [2] The string dot is an atomic parsing expression. It
104 matches any character.
105
106 [3] The string alnum is an atomic parsing expression.
107 It matches any Unicode alphabet or digit charac‐
108 ter. This is a custom extension of PEs based on
109 Tcl's builtin command string is.
110
111 [4] The string alpha is an atomic parsing expression.
112 It matches any Unicode alphabet character. This is
113 a custom extension of PEs based on Tcl's builtin
114 command string is.
115
116 [5] The string ascii is an atomic parsing expression.
117 It matches any Unicode character below U0080. This
118 is a custom extension of PEs based on Tcl's
119 builtin command string is.
120
121 [6] The string control is an atomic parsing expres‐
122 sion. It matches any Unicode control character.
123 This is a custom extension of PEs based on Tcl's
124 builtin command string is.
125
126 [7] The string digit is an atomic parsing expression.
127 It matches any Unicode digit character. Note that
128 this includes characters outside of the [0..9]
129 range. This is a custom extension of PEs based on
130 Tcl's builtin command string is.
131
132 [8] The string graph is an atomic parsing expression.
133 It matches any Unicode printing character, except
134 for space. This is a custom extension of PEs based
135 on Tcl's builtin command string is.
136
137 [9] The string lower is an atomic parsing expression.
138 It matches any Unicode lower-case alphabet charac‐
139 ter. This is a custom extension of PEs based on
140 Tcl's builtin command string is.
141
142 [10] The string print is an atomic parsing expression.
143 It matches any Unicode printing character, includ‐
144 ing space. This is a custom extension of PEs based
145 on Tcl's builtin command string is.
146
147 [11] The string punct is an atomic parsing expression.
148 It matches any Unicode punctuation character. This
149 is a custom extension of PEs based on Tcl's
150 builtin command string is.
151
152 [12] The string space is an atomic parsing expression.
153 It matches any Unicode space character. This is a
154 custom extension of PEs based on Tcl's builtin
155 command string is.
156
157 [13] The string upper is an atomic parsing expression.
158 It matches any Unicode upper-case alphabet charac‐
159 ter. This is a custom extension of PEs based on
160 Tcl's builtin command string is.
161
162 [14] The string wordchar is an atomic parsing expres‐
163 sion. It matches any Unicode word character. This
164 is any alphanumeric character (see alnum), and any
165 connector punctuation characters (e.g. under‐
166 score). This is a custom extension of PEs based on
167 Tcl's builtin command string is.
168
169 [15] The string xdigit is an atomic parsing expression.
170 It matches any hexadecimal digit character. This
171 is a custom extension of PEs based on Tcl's
172 builtin command string is.
173
174 [16] The string ddigit is an atomic parsing expression.
175 It matches any decimal digit character. This is a
176 custom extension of PEs based on Tcl's builtin
177 command regexp.
178
179 [17] The expression [list t x] is an atomic parsing
180 expression. It matches the terminal string x.
181
182 [18] The expression [list n A] is an atomic parsing
183 expression. It matches the nonterminal A.
184
185 Combined Parsing Expressions
186
187 [1] For parsing expressions e1, e2, ... the result of
188 [list / e1 e2 ... ] is a parsing expression as
189 well. This is the ordered choice, aka prioritized
190 choice.
191
192 [2] For parsing expressions e1, e2, ... the result of
193 [list x e1 e2 ... ] is a parsing expression as
194 well. This is the sequence.
195
196 [3] For a parsing expression e the result of [list *
197 e] is a parsing expression as well. This is the
198 kleene closure, describing zero or more repeti‐
199 tions.
200
201 [4] For a parsing expression e the result of [list +
202 e] is a parsing expression as well. This is the
203 positive kleene closure, describing one or more
204 repetitions.
205
206 [5] For a parsing expression e the result of [list &
207 e] is a parsing expression as well. This is the
208 and lookahead predicate.
209
210 [6] For a parsing expression e the result of [list !
211 e] is a parsing expression as well. This is the
212 not lookahead predicate.
213
214 [7] For a parsing expression e the result of [list ?
215 e] is a parsing expression as well. This is the
216 optional input.
217
218 Canonical serialization
219 The canonical serialization of a parsing expression has the for‐
220 mat as specified in the previous item, and then additionally
221 satisfies the constraints below, which make it unique among all
222 the possible serializations of this parsing expression.
223
224 [1] The string representation of the value is the canonical
225 representation of a pure Tcl list. I.e. it does not con‐
226 tain superfluous whitespace.
227
228 [2] Terminals are not encoded as ranges (where start and end
229 of the range are identical).
230
231 EXAMPLE
232 Assuming the parsing expression shown on the right-hand side of the
233 rule
234
235 Expression <- Term (AddOp Term)*
236
237
238 then its canonical serialization (except for whitespace) is
239
240 {x {n Term} {* {x {n AddOp} {n Term}}}}
241
242
244 This document, and the package it describes, will undoubtedly contain
245 bugs and other problems. Please report such in the category pt of the
246 Tcllib Trackers [http://core.tcl.tk/tcllib/reportlist]. Please also
247 report any ideas for enhancements you may have for either package
248 and/or documentation.
249
250 When proposing code changes, please provide unified diffs, i.e the out‐
251 put of diff -u.
252
253 Note further that attachments are strongly preferred over inlined
254 patches. Attachments can be made by going to the Edit form of the
255 ticket immediately after its creation, and then using the left-most
256 button in the secondary navigation bar.
257
259 EBNF, LL(k), PEG, TDPL, context-free languages, expression, grammar,
260 matching, parser, parsing expression, parsing expression grammar, push
261 down automaton, recursive descent, state, top-down parsing languages,
262 transducer
263
265 Parsing and Grammars
266
268 Copyright (c) 2009 Andreas Kupries <andreas_kupries@users.sourceforge.net>
269
270
271
272
273tcllib 1.0.1 pt::pe::op(n)