1pt::peg::export::container(n) Parser Tools pt::peg::export::container(n)
2
3
4
5______________________________________________________________________________
6
8 pt::peg::export::container - PEG Export Plugin. Write CONTAINER format
9
11 package require Tcl 8.5
12
13 package require pt::peg::export::container ?1?
14
15 package require pt::peg::to::container
16
17 export serial configuration
18
19______________________________________________________________________________
20
22 Are you lost ? Do you have trouble understanding this document ? In
23 that case please read the overview provided by the Introduction to
24 Parser Tools. This document is the entrypoint to the whole system the
25 current package is a part of.
26
27 This package implements the parsing expression grammar export plugin
28 for the generation of CONTAINER markup.
29
30 It resides in the Export section of the Core Layer of Parser Tools and
31 is intended to be used by pt::peg::export, the export manager, sitting
32 between it and the corresponding core conversion functionality provided
33 by pt::peg::to::container.
34
35 IMAGE: arch_core_eplugins
36
37 While the direct use of this package with a regular interpreter is pos‐
38 sible, this is strongly disrecommended and requires a number of contor‐
39 tions to provide the expected environment. The proper way to use this
40 functionality depends on the situation:
41
42 [1] In an untrusted environment the proper access is through the
43 package pt::peg::export and the export manager objects it pro‐
44 vides.
45
46 [2] In a trusted environment however simply use the package
47 pt::peg::to::container and access the core conversion function‐
48 ality directly.
49
51 The API provided by this package satisfies the specification of the
52 Plugin API found in the Parser Tools Export API specification.
53
54 export serial configuration
55 This command takes the canonical serialization of a parsing ex‐
56 pression grammar, as specified in section PEG serialization for‐
57 mat, and contained in serial, the configuration, a dictionary,
58 and generates CONTAINER markup encoding the grammar. The cre‐
59 ated string is then returned as the result of the command.
60
62 The CONTAINER export plugin recognizes the following configuration
63 variables and changes its behaviour as they specify.
64
65 enum mode
66 The value of this configuration variable controls which methods
67 of pt::peg instances the plugin will use to specify the grammar.
68 There are two legal values
69
70 bulk In this mode the methods start, add, modes, and rules are
71 used to specify the grammar in a bulk manner, i.e. as a
72 set of nonterminal symbols, and two dictionaries mapping
73 from the symbols to their semantic modes and parsing ex‐
74 pressions.
75
76 This mode is the default.
77
78 incremental
79 In this mode the methods start, add, mode, and rule are
80 used to specify the grammar piecemal, with each nontermi‐
81 nal having its own block of defining commands.
82
83 string template
84 If this configuration variable is set it is assumed to contain a
85 string into which to put the generated code and other configura‐
86 tion data. The various locations are expected to be specified
87 with the following placeholders:
88
89 @user@ To be replaced with the value of the configuration vari‐
90 able user.
91
92 @format@
93 To be replaced with the the constant CONTAINER.
94
95 @file@ To be replaced with the value of the configuration vari‐
96 able file.
97
98 @name@ To be replaced with the value of the configuration vari‐
99 able name.
100
101 @mode@ To be replaced with the value of the configuration vari‐
102 able mode.
103
104 @code@ To be replaced with the generated code.
105
106 If this configuration variable is not set, or empty, then the plugin
107 falls back to a standard template, which is defined as "@code@".
108
109 Note that this plugin may ignore the standard configuration variables
110 user, format, file, and their values, depending on the chosen template.
111
112 The content of the standard configuration variable name, if set, is
113 used as name of the grammar in the output. Otherwise the plugin falls
114 back to the default name a_pe_grammar.
115
117 The container format is another form of describing parsing expression
118 grammars. While data in this format is executable it does not consti‐
119 tute a parser for the grammar. It always has to be used in conjunction
120 with the package pt::peg::interp, a grammar interpreter.
121
122 The format represents grammars by a snit::type, i.e. class, whose in‐
123 stances are API-compatible to the instances of the pt::peg::container
124 package, and which are preloaded with the grammar in question.
125
126 It has no direct formal specification beyond what was said above.
127
128 EXAMPLE
129 Assuming the following PEG for simple mathematical expressions
130
131 PEG calculator (Expression)
132 Digit <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9' ;
133 Sign <- '-' / '+' ;
134 Number <- Sign? Digit+ ;
135 Expression <- Term (AddOp Term)* ;
136 MulOp <- '*' / '/' ;
137 Term <- Factor (MulOp Factor)* ;
138 AddOp <- '+'/'-' ;
139 Factor <- '(' Expression ')' / Number ;
140 END;
141
142
143 one possible CONTAINER serialization for it is
144
145 snit::type a_pe_grammar {
146 constructor {} {
147 install myg using pt::peg::container ${selfns}::G
148 $myg start {n Expression}
149 $myg add AddOp Digit Expression Factor MulOp Number Sign Term
150 $myg modes {
151 AddOp value
152 Digit value
153 Expression value
154 Factor value
155 MulOp value
156 Number value
157 Sign value
158 Term value
159 }
160 $myg rules {
161 AddOp {/ {t -} {t +}}
162 Digit {/ {t 0} {t 1} {t 2} {t 3} {t 4} {t 5} {t 6} {t 7} {t 8} {t 9}}
163 Expression {/ {x {t \50} {n Expression} {t \51}} {x {n Factor} {* {x {n MulOp} {n Factor}}}}}
164 Factor {x {n Term} {* {x {n AddOp} {n Term}}}}
165 MulOp {/ {t *} {t /}}
166 Number {x {? {n Sign}} {+ {n Digit}}}
167 Sign {/ {t -} {t +}}
168 Term {n Number}
169 }
170 return
171 }
172
173 component myg
174 delegate method * to myg
175 }
176
177
179 Here we specify the format used by the Parser Tools to serialize Pars‐
180 ing Expression Grammars as immutable values for transport, comparison,
181 etc.
182
183 We distinguish between regular and canonical serializations. While a
184 PEG may have more than one regular serialization only exactly one of
185 them will be canonical.
186
187 regular serialization
188
189 [1] The serialization of any PEG is a nested Tcl dictionary.
190
191 [2] This dictionary holds a single key, pt::grammar::peg, and
192 its value. This value holds the contents of the grammar.
193
194 [3] The contents of the grammar are a Tcl dictionary holding
195 the set of nonterminal symbols and the starting expres‐
196 sion. The relevant keys and their values are
197
198 rules The value is a Tcl dictionary whose keys are the
199 names of the nonterminal symbols known to the
200 grammar.
201
202 [1] Each nonterminal symbol may occur only
203 once.
204
205 [2] The empty string is not a legal nonterminal
206 symbol.
207
208 [3] The value for each symbol is a Tcl dictio‐
209 nary itself. The relevant keys and their
210 values in this dictionary are
211
212 is The value is the serialization of
213 the parsing expression describing
214 the symbols sentennial structure, as
215 specified in the section PE serial‐
216 ization format.
217
218 mode The value can be one of three values
219 specifying how a parser should han‐
220 dle the semantic value produced by
221 the symbol.
222
223 value The semantic value of the
224 nonterminal symbol is an ab‐
225 stract syntax tree consisting
226 of a single node node for the
227 nonterminal itself, which has
228 the ASTs of the symbol's
229 right hand side as its chil‐
230 dren.
231
232 leaf The semantic value of the
233 nonterminal symbol is an ab‐
234 stract syntax tree consisting
235 of a single node node for the
236 nonterminal, without any
237 children. Any ASTs generated
238 by the symbol's right hand
239 side are discarded.
240
241 void The nonterminal has no seman‐
242 tic value. Any ASTs generated
243 by the symbol's right hand
244 side are discarded (as well).
245
246 start The value is the serialization of the start pars‐
247 ing expression of the grammar, as specified in the
248 section PE serialization format.
249
250 [4] The terminal symbols of the grammar are specified implic‐
251 itly as the set of all terminal symbols used in the start
252 expression and on the RHS of the grammar rules.
253
254 canonical serialization
255 The canonical serialization of a grammar has the format as spec‐
256 ified in the previous item, and then additionally satisfies the
257 constraints below, which make it unique among all the possible
258 serializations of this grammar.
259
260 [1] The keys found in all the nested Tcl dictionaries are
261 sorted in ascending dictionary order, as generated by
262 Tcl's builtin command lsort -increasing -dict.
263
264 [2] The string representation of the value is the canonical
265 representation of a Tcl dictionary. I.e. it does not con‐
266 tain superfluous whitespace.
267
268 EXAMPLE
269 Assuming the following PEG for simple mathematical expressions
270
271 PEG calculator (Expression)
272 Digit <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9' ;
273 Sign <- '-' / '+' ;
274 Number <- Sign? Digit+ ;
275 Expression <- Term (AddOp Term)* ;
276 MulOp <- '*' / '/' ;
277 Term <- Factor (MulOp Factor)* ;
278 AddOp <- '+'/'-' ;
279 Factor <- '(' Expression ')' / Number ;
280 END;
281
282
283 then its canonical serialization (except for whitespace) is
284
285 pt::grammar::peg {
286 rules {
287 AddOp {is {/ {t -} {t +}} mode value}
288 Digit {is {/ {t 0} {t 1} {t 2} {t 3} {t 4} {t 5} {t 6} {t 7} {t 8} {t 9}} mode value}
289 Expression {is {x {n Term} {* {x {n AddOp} {n Term}}}} mode value}
290 Factor {is {/ {x {t (} {n Expression} {t )}} {n Number}} mode value}
291 MulOp {is {/ {t *} {t /}} mode value}
292 Number {is {x {? {n Sign}} {+ {n Digit}}} mode value}
293 Sign {is {/ {t -} {t +}} mode value}
294 Term {is {x {n Factor} {* {x {n MulOp} {n Factor}}}} mode value}
295 }
296 start {n Expression}
297 }
298
299
301 Here we specify the format used by the Parser Tools to serialize Pars‐
302 ing Expressions as immutable values for transport, comparison, etc.
303
304 We distinguish between regular and canonical serializations. While a
305 parsing expression may have more than one regular serialization only
306 exactly one of them will be canonical.
307
308 Regular serialization
309
310 Atomic Parsing Expressions
311
312 [1] The string epsilon is an atomic parsing expres‐
313 sion. It matches the empty string.
314
315 [2] The string dot is an atomic parsing expression. It
316 matches any character.
317
318 [3] The string alnum is an atomic parsing expression.
319 It matches any Unicode alphabet or digit charac‐
320 ter. This is a custom extension of PEs based on
321 Tcl's builtin command string is.
322
323 [4] The string alpha is an atomic parsing expression.
324 It matches any Unicode alphabet character. This is
325 a custom extension of PEs based on Tcl's builtin
326 command string is.
327
328 [5] The string ascii is an atomic parsing expression.
329 It matches any Unicode character below U0080. This
330 is a custom extension of PEs based on Tcl's
331 builtin command string is.
332
333 [6] The string control is an atomic parsing expres‐
334 sion. It matches any Unicode control character.
335 This is a custom extension of PEs based on Tcl's
336 builtin command string is.
337
338 [7] The string digit is an atomic parsing expression.
339 It matches any Unicode digit character. Note that
340 this includes characters outside of the [0..9]
341 range. This is a custom extension of PEs based on
342 Tcl's builtin command string is.
343
344 [8] The string graph is an atomic parsing expression.
345 It matches any Unicode printing character, except
346 for space. This is a custom extension of PEs based
347 on Tcl's builtin command string is.
348
349 [9] The string lower is an atomic parsing expression.
350 It matches any Unicode lower-case alphabet charac‐
351 ter. This is a custom extension of PEs based on
352 Tcl's builtin command string is.
353
354 [10] The string print is an atomic parsing expression.
355 It matches any Unicode printing character, includ‐
356 ing space. This is a custom extension of PEs based
357 on Tcl's builtin command string is.
358
359 [11] The string punct is an atomic parsing expression.
360 It matches any Unicode punctuation character. This
361 is a custom extension of PEs based on Tcl's
362 builtin command string is.
363
364 [12] The string space is an atomic parsing expression.
365 It matches any Unicode space character. This is a
366 custom extension of PEs based on Tcl's builtin
367 command string is.
368
369 [13] The string upper is an atomic parsing expression.
370 It matches any Unicode upper-case alphabet charac‐
371 ter. This is a custom extension of PEs based on
372 Tcl's builtin command string is.
373
374 [14] The string wordchar is an atomic parsing expres‐
375 sion. It matches any Unicode word character. This
376 is any alphanumeric character (see alnum), and any
377 connector punctuation characters (e.g. under‐
378 score). This is a custom extension of PEs based on
379 Tcl's builtin command string is.
380
381 [15] The string xdigit is an atomic parsing expression.
382 It matches any hexadecimal digit character. This
383 is a custom extension of PEs based on Tcl's
384 builtin command string is.
385
386 [16] The string ddigit is an atomic parsing expression.
387 It matches any decimal digit character. This is a
388 custom extension of PEs based on Tcl's builtin
389 command regexp.
390
391 [17] The expression [list t x] is an atomic parsing ex‐
392 pression. It matches the terminal string x.
393
394 [18] The expression [list n A] is an atomic parsing ex‐
395 pression. It matches the nonterminal A.
396
397 Combined Parsing Expressions
398
399 [1] For parsing expressions e1, e2, ... the result of
400 [list / e1 e2 ... ] is a parsing expression as
401 well. This is the ordered choice, aka prioritized
402 choice.
403
404 [2] For parsing expressions e1, e2, ... the result of
405 [list x e1 e2 ... ] is a parsing expression as
406 well. This is the sequence.
407
408 [3] For a parsing expression e the result of [list *
409 e] is a parsing expression as well. This is the
410 kleene closure, describing zero or more repeti‐
411 tions.
412
413 [4] For a parsing expression e the result of [list +
414 e] is a parsing expression as well. This is the
415 positive kleene closure, describing one or more
416 repetitions.
417
418 [5] For a parsing expression e the result of [list &
419 e] is a parsing expression as well. This is the
420 and lookahead predicate.
421
422 [6] For a parsing expression e the result of [list !
423 e] is a parsing expression as well. This is the
424 not lookahead predicate.
425
426 [7] For a parsing expression e the result of [list ?
427 e] is a parsing expression as well. This is the
428 optional input.
429
430 Canonical serialization
431 The canonical serialization of a parsing expression has the for‐
432 mat as specified in the previous item, and then additionally
433 satisfies the constraints below, which make it unique among all
434 the possible serializations of this parsing expression.
435
436 [1] The string representation of the value is the canonical
437 representation of a pure Tcl list. I.e. it does not con‐
438 tain superfluous whitespace.
439
440 [2] Terminals are not encoded as ranges (where start and end
441 of the range are identical).
442
443 EXAMPLE
444 Assuming the parsing expression shown on the right-hand side of the
445 rule
446
447 Expression <- Term (AddOp Term)*
448
449
450 then its canonical serialization (except for whitespace) is
451
452 {x {n Term} {* {x {n AddOp} {n Term}}}}
453
454
456 This document, and the package it describes, will undoubtedly contain
457 bugs and other problems. Please report such in the category pt of the
458 Tcllib Trackers [http://core.tcl.tk/tcllib/reportlist]. Please also
459 report any ideas for enhancements you may have for either package
460 and/or documentation.
461
462 When proposing code changes, please provide unified diffs, i.e the out‐
463 put of diff -u.
464
465 Note further that attachments are strongly preferred over inlined
466 patches. Attachments can be made by going to the Edit form of the
467 ticket immediately after its creation, and then using the left-most
468 button in the secondary navigation bar.
469
471 CONTAINER, EBNF, LL(k), PEG, TDPL, context-free languages, export, ex‐
472 pression, grammar, matching, parser, parsing expression, parsing ex‐
473 pression grammar, plugin, push down automaton, recursive descent, seri‐
474 alization, state, top-down parsing languages, transducer
475
477 Parsing and Grammars
478
480 Copyright (c) 2009 Andreas Kupries <andreas_kupries@users.sourceforge.net>
481
482
483
484
485tcllib 1 pt::peg::export::container(n)