1pt::pe(n) Parser Tools pt::pe(n)
2
3
4
5______________________________________________________________________________
6
8 pt::pe - Parsing Expression Serialization
9
11 package require Tcl 8.5
12
13 package require pt::pe ?1.0.1?
14
15 package require char
16
17 ::pt::pe verify serial ?canonvar?
18
19 ::pt::pe verify-as-canonical serial
20
21 ::pt::pe canonicalize serial
22
23 ::pt::pe print serial
24
25 ::pt::pe bottomup cmdprefix pe
26
27 cmdprefix pe op arguments
28
29 ::pt::pe topdown cmdprefix pe
30
31 ::pt::pe equal seriala serialb
32
33 ::pt::pe epsilon
34
35 ::pt::pe dot
36
37 ::pt::pe alnum
38
39 ::pt::pe alpha
40
41 ::pt::pe ascii
42
43 ::pt::pe control
44
45 ::pt::pe digit
46
47 ::pt::pe graph
48
49 ::pt::pe lower
50
51 ::pt::pe print
52
53 ::pt::pe punct
54
55 ::pt::pe space
56
57 ::pt::pe upper
58
59 ::pt::pe wordchar
60
61 ::pt::pe xdigit
62
63 ::pt::pe ddigit
64
65 ::pt::pe terminal t
66
67 ::pt::pe range ta tb
68
69 ::pt::pe nonterminal nt
70
71 ::pt::pe choice pe...
72
73 ::pt::pe sequence pe...
74
75 ::pt::pe repeat0 pe
76
77 ::pt::pe repeat1 pe
78
79 ::pt::pe optional pe
80
81 ::pt::pe ahead pe
82
83 ::pt::pe notahead pe
84
85______________________________________________________________________________
86
88 Are you lost ? Do you have trouble understanding this document ? In
89 that case please read the overview provided by the Introduction to
90 Parser Tools. This document is the entrypoint to the whole system the
91 current package is a part of.
92
93 This package provides commands to work with the serializations of pars‐
94 ing expressions as managed by the Parser Tools, and specified in sec‐
95 tion PE serialization format.
96
97 This is a supporting package in the Core Layer of Parser Tools.
98
99 IMAGE: arch_core_support
100
102 ::pt::pe verify serial ?canonvar?
103 This command verifies that the content of serial is a valid se‐
104 rialization of a parsing expression and will throw an error if
105 that is not the case. The result of the command is the empty
106 string.
107
108 If the argument canonvar is specified it is interpreted as the
109 name of a variable in the calling context. This variable will be
110 written to if and only if serial is a valid regular serializa‐
111 tion. Its value will be a boolean, with True indicating that the
112 serialization is not only valid, but also canonical. False will
113 be written for a valid, but non-canonical serialization.
114
115 For the specification of serializations see the section PE seri‐
116 alization format.
117
118 ::pt::pe verify-as-canonical serial
119 This command verifies that the content of serial is a valid
120 canonical serialization of a parsing expression and will throw
121 an error if that is not the case. The result of the command is
122 the empty string.
123
124 For the specification of canonical serializations see the sec‐
125 tion PE serialization format.
126
127 ::pt::pe canonicalize serial
128 This command assumes that the content of serial is a valid regu‐
129 lar serialization of a parsing expression and will throw an er‐
130 ror if that is not the case.
131
132 It will then convert the input into the canonical serialization
133 of this parsing expression and return it as its result. If the
134 input is already canonical it will be returned unchanged.
135
136 For the specification of regular and canonical serializations
137 see the section PE serialization format.
138
139 ::pt::pe print serial
140 This command assumes that the argument serial contains a valid
141 serialization of a parsing expression and returns a string con‐
142 taining that PE in a human readable form.
143
144 The exact format of this form is not specified and cannot be re‐
145 lied on for parsing or other machine-based activities.
146
147 For the specification of serializations see the section PE seri‐
148 alization format.
149
150 ::pt::pe bottomup cmdprefix pe
151 This command walks the parsing expression pe from the bottom up
152 to the root, invoking the command prefix cmdprefix for each par‐
153 tial expression. This implies that the children of a parsing ex‐
154 pression PE are handled before PE.
155
156 The command prefix has the signature
157
158 cmdprefix pe op arguments
159 I.e. it is invoked with the parsing expression pe the
160 walk is currently at, the op'erator in the pe, and the
161 operator's arguments.
162
163 The result returned by the command prefix replaces pe in
164 the parsing expression it was a child of, allowing trans‐
165 formations of the expression tree.
166
167 This also means that for all inner parsing expressions
168 the contents of arguments are the results of the command
169 prefix invoked for the children of this inner parsing ex‐
170 pression.
171
172 ::pt::pe topdown cmdprefix pe
173 This command walks the parsing expression pe from the root down
174 to the leaves, invoking the command prefix cmdprefix for each
175 partial expression. This implies that the children of a parsing
176 expression PE are handled after PE.
177
178 The command prefix has the same signature as for bottomup, see
179 above.
180
181 The result returned by the command prefix is ignored.
182
183 ::pt::pe equal seriala serialb
184 This command tests the two parsing expressions seriala and seri‐
185 alb for structural equality. The result of the command is a
186 boolean value. It will be set to true if the expressions are
187 identical, and false otherwise.
188
189 String equality is usable only if we can assume that the two
190 parsing expressions are pure Tcl lists.
191
192 ::pt::pe epsilon
193 This command constructs the atomic parsing expression for ep‐
194 silon.
195
196 ::pt::pe dot
197 This command constructs the atomic parsing expression for dot.
198
199 ::pt::pe alnum
200 This command constructs the atomic parsing expression for alnum.
201
202 ::pt::pe alpha
203 This command constructs the atomic parsing expression for alpha.
204
205 ::pt::pe ascii
206 This command constructs the atomic parsing expression for ascii.
207
208 ::pt::pe control
209 This command constructs the atomic parsing expression for con‐
210 trol.
211
212 ::pt::pe digit
213 This command constructs the atomic parsing expression for digit.
214
215 ::pt::pe graph
216 This command constructs the atomic parsing expression for graph.
217
218 ::pt::pe lower
219 This command constructs the atomic parsing expression for lower.
220
221 ::pt::pe print
222 This command constructs the atomic parsing expression for print.
223
224 ::pt::pe punct
225 This command constructs the atomic parsing expression for punct.
226
227 ::pt::pe space
228 This command constructs the atomic parsing expression for space.
229
230 ::pt::pe upper
231 This command constructs the atomic parsing expression for upper.
232
233 ::pt::pe wordchar
234 This command constructs the atomic parsing expression for word‐
235 char.
236
237 ::pt::pe xdigit
238 This command constructs the atomic parsing expression for
239 xdigit.
240
241 ::pt::pe ddigit
242 This command constructs the atomic parsing expression for
243 ddigit.
244
245 ::pt::pe terminal t
246 This command constructs the atomic parsing expression for the
247 terminal symbol t.
248
249 ::pt::pe range ta tb
250 This command constructs the atomic parsing expression for the
251 range of terminal symbols ta ... tb.
252
253 ::pt::pe nonterminal nt
254 This command constructs the atomic parsing expression for the
255 nonterminal symbol nt.
256
257 ::pt::pe choice pe...
258 This command constructs the parsing expression representing the
259 ordered or prioritized choice between the argument parsing ex‐
260 pressions. The first argument has the highest priority.
261
262 ::pt::pe sequence pe...
263 This command constructs the parsing expression representing the
264 sequence of the argument parsing expression. The first argument
265 is the first element of the sequence.
266
267 ::pt::pe repeat0 pe
268 This command constructs the parsing expression representing the
269 zero or more repetition of the argument parsing expression pe,
270 also known as the kleene closure.
271
272 ::pt::pe repeat1 pe
273 This command constructs the parsing expression representing the
274 one or more repetition of the argument parsing expression pe,
275 also known as the positive kleene closure.
276
277 ::pt::pe optional pe
278 This command constructs the parsing expression representing the
279 optionality of the argument parsing expression pe.
280
281 ::pt::pe ahead pe
282 This command constructs the parsing expression representing the
283 positive lookahead of the argument parsing expression pe.
284
285 ::pt::pe notahead pe
286 This command constructs the parsing expression representing the
287 negative lookahead of the argument parsing expression pe.
288
290 Here we specify the format used by the Parser Tools to serialize Pars‐
291 ing Expressions as immutable values for transport, comparison, etc.
292
293 We distinguish between regular and canonical serializations. While a
294 parsing expression may have more than one regular serialization only
295 exactly one of them will be canonical.
296
297 Regular serialization
298
299 Atomic Parsing Expressions
300
301 [1] The string epsilon is an atomic parsing expres‐
302 sion. It matches the empty string.
303
304 [2] The string dot is an atomic parsing expression. It
305 matches any character.
306
307 [3] The string alnum is an atomic parsing expression.
308 It matches any Unicode alphabet or digit charac‐
309 ter. This is a custom extension of PEs based on
310 Tcl's builtin command string is.
311
312 [4] The string alpha is an atomic parsing expression.
313 It matches any Unicode alphabet character. This is
314 a custom extension of PEs based on Tcl's builtin
315 command string is.
316
317 [5] The string ascii is an atomic parsing expression.
318 It matches any Unicode character below U0080. This
319 is a custom extension of PEs based on Tcl's
320 builtin command string is.
321
322 [6] The string control is an atomic parsing expres‐
323 sion. It matches any Unicode control character.
324 This is a custom extension of PEs based on Tcl's
325 builtin command string is.
326
327 [7] The string digit is an atomic parsing expression.
328 It matches any Unicode digit character. Note that
329 this includes characters outside of the [0..9]
330 range. This is a custom extension of PEs based on
331 Tcl's builtin command string is.
332
333 [8] The string graph is an atomic parsing expression.
334 It matches any Unicode printing character, except
335 for space. This is a custom extension of PEs based
336 on Tcl's builtin command string is.
337
338 [9] The string lower is an atomic parsing expression.
339 It matches any Unicode lower-case alphabet charac‐
340 ter. This is a custom extension of PEs based on
341 Tcl's builtin command string is.
342
343 [10] The string print is an atomic parsing expression.
344 It matches any Unicode printing character, includ‐
345 ing space. This is a custom extension of PEs based
346 on Tcl's builtin command string is.
347
348 [11] The string punct is an atomic parsing expression.
349 It matches any Unicode punctuation character. This
350 is a custom extension of PEs based on Tcl's
351 builtin command string is.
352
353 [12] The string space is an atomic parsing expression.
354 It matches any Unicode space character. This is a
355 custom extension of PEs based on Tcl's builtin
356 command string is.
357
358 [13] The string upper is an atomic parsing expression.
359 It matches any Unicode upper-case alphabet charac‐
360 ter. This is a custom extension of PEs based on
361 Tcl's builtin command string is.
362
363 [14] The string wordchar is an atomic parsing expres‐
364 sion. It matches any Unicode word character. This
365 is any alphanumeric character (see alnum), and any
366 connector punctuation characters (e.g. under‐
367 score). This is a custom extension of PEs based on
368 Tcl's builtin command string is.
369
370 [15] The string xdigit is an atomic parsing expression.
371 It matches any hexadecimal digit character. This
372 is a custom extension of PEs based on Tcl's
373 builtin command string is.
374
375 [16] The string ddigit is an atomic parsing expression.
376 It matches any decimal digit character. This is a
377 custom extension of PEs based on Tcl's builtin
378 command regexp.
379
380 [17] The expression [list t x] is an atomic parsing ex‐
381 pression. It matches the terminal string x.
382
383 [18] The expression [list n A] is an atomic parsing ex‐
384 pression. It matches the nonterminal A.
385
386 Combined Parsing Expressions
387
388 [1] For parsing expressions e1, e2, ... the result of
389 [list / e1 e2 ... ] is a parsing expression as
390 well. This is the ordered choice, aka prioritized
391 choice.
392
393 [2] For parsing expressions e1, e2, ... the result of
394 [list x e1 e2 ... ] is a parsing expression as
395 well. This is the sequence.
396
397 [3] For a parsing expression e the result of [list *
398 e] is a parsing expression as well. This is the
399 kleene closure, describing zero or more repeti‐
400 tions.
401
402 [4] For a parsing expression e the result of [list +
403 e] is a parsing expression as well. This is the
404 positive kleene closure, describing one or more
405 repetitions.
406
407 [5] For a parsing expression e the result of [list &
408 e] is a parsing expression as well. This is the
409 and lookahead predicate.
410
411 [6] For a parsing expression e the result of [list !
412 e] is a parsing expression as well. This is the
413 not lookahead predicate.
414
415 [7] For a parsing expression e the result of [list ?
416 e] is a parsing expression as well. This is the
417 optional input.
418
419 Canonical serialization
420 The canonical serialization of a parsing expression has the for‐
421 mat as specified in the previous item, and then additionally
422 satisfies the constraints below, which make it unique among all
423 the possible serializations of this parsing expression.
424
425 [1] The string representation of the value is the canonical
426 representation of a pure Tcl list. I.e. it does not con‐
427 tain superfluous whitespace.
428
429 [2] Terminals are not encoded as ranges (where start and end
430 of the range are identical).
431
432 EXAMPLE
433 Assuming the parsing expression shown on the right-hand side of the
434 rule
435
436 Expression <- Term (AddOp Term)*
437
438
439 then its canonical serialization (except for whitespace) is
440
441 {x {n Term} {* {x {n AddOp} {n Term}}}}
442
443
445 This document, and the package it describes, will undoubtedly contain
446 bugs and other problems. Please report such in the category pt of the
447 Tcllib Trackers [http://core.tcl.tk/tcllib/reportlist]. Please also
448 report any ideas for enhancements you may have for either package
449 and/or documentation.
450
451 When proposing code changes, please provide unified diffs, i.e the out‐
452 put of diff -u.
453
454 Note further that attachments are strongly preferred over inlined
455 patches. Attachments can be made by going to the Edit form of the
456 ticket immediately after its creation, and then using the left-most
457 button in the secondary navigation bar.
458
460 EBNF, LL(k), PEG, TDPL, context-free languages, expression, grammar,
461 matching, parser, parsing expression, parsing expression grammar, push
462 down automaton, recursive descent, state, top-down parsing languages,
463 transducer
464
466 Parsing and Grammars
467
469 Copyright (c) 2009 Andreas Kupries <andreas_kupries@users.sourceforge.net>
470
471
472
473
474tcllib 1.0.1 pt::pe(n)