1pt::peg::to::container(n) Parser Tools pt::peg::to::container(n)
2
3
4
5______________________________________________________________________________
6
8 pt::peg::to::container - PEG Conversion. Write CONTAINER format
9
11 package require Tcl 8.5
12
13 package require pt::peg::to::container ?1?
14
15 package require pt::peg
16
17 package require text::write
18
19 package require char
20
21 pt::peg::to::container reset
22
23 pt::peg::to::container configure
24
25 pt::peg::to::container configure option
26
27 pt::peg::to::container configure option value...
28
29 pt::peg::to::container convert serial
30
31______________________________________________________________________________
32
34 Are you lost ? Do you have trouble understanding this document ? In
35 that case please read the overview provided by the Introduction to
36 Parser Tools. This document is the entrypoint to the whole system the
37 current package is a part of.
38
39 This package implements the converter from parsing expression grammars
40 to CONTAINER markup.
41
42 It resides in the Export section of the Core Layer of Parser Tools, and
43 can be used either directly with the other packages of this layer, or
44 indirectly through the export manager provided by pt::peg::export. The
45 latter is intented for use in untrusted environments and done through
46 the corresponding export plugin pt::peg::export::container sitting
47 between converter and export manager.
48
49 IMAGE: arch_core_eplugins
50
52 The API provided by this package satisfies the specification of the
53 Converter API found in the Parser Tools Export API specification.
54
55 pt::peg::to::container reset
56 This command resets the configuration of the package to its
57 default settings.
58
59 pt::peg::to::container configure
60 This command returns a dictionary containing the current config‐
61 uration of the package.
62
63 pt::peg::to::container configure option
64 This command returns the current value of the specified configu‐
65 ration option of the package. For the set of legal options,
66 please read the section Options.
67
68 pt::peg::to::container configure option value...
69 This command sets the given configuration options of the pack‐
70 age, to the specified values. For the set of legal options,
71 please read the section Options.
72
73 pt::peg::to::container convert serial
74 This command takes the canonical serialization of a parsing
75 expression grammar, as specified in section PEG serialization
76 format, and contained in serial, and generates CONTAINER markup
77 encoding the grammar, per the current package configuration.
78 The created string is then returned as the result of the com‐
79 mand.
80
82 The converter to the CONTAINER format recognizes the following options
83 and changes its behaviour as they specify.
84
85 -file string
86 The value of this option is the name of the file or other entity
87 from which the grammar came, for which the command is run. The
88 default value is unknown.
89
90 -name string
91 The value of this option is the name of the grammar we are pro‐
92 cessing. The default value is a_pe_grammar.
93
94 -user string
95 The value of this option is the name of the user for which the
96 command is run. The default value is unknown.
97
98 -mode bulk|incremental
99 The value of this option controls which methods of pt::peg::con‐
100 tainer instances are used to specify the grammar, i.e. preload
101 it into the container. There are two legal values, as listed
102 below. The default is bulk.
103
104 bulk In this mode the methods start, add, modes, and rules are
105 used to specify the grammar in a bulk manner, i.e. as a
106 set of nonterminal symbols, and two dictionaries mapping
107 from the symbols to their semantic modes and parsing
108 expressions.
109
110 This mode is the default.
111
112 incremental
113 In this mode the methods start, add, mode, and rule are
114 used to specify the grammar piecemal, with each nontermi‐
115 nal having its own block of defining commands.
116
117 -template string
118 The value of this option is a string into which to put the gen‐
119 erated code and the other configuration settings. The various
120 locations for user-data are expected to be specified with the
121 placeholders listed below. The default value is "@code@".
122
123 @user@ To be replaced with the value of the option -user.
124
125 @format@
126 To be replaced with the the constant CONTAINER.
127
128 @file@ To be replaced with the value of the option -file.
129
130 @name@ To be replaced with the value of the option -name.
131
132 @mode@ To be replaced with the value of the option -mode.
133
134 @code@ To be replaced with the generated code.
135
137 The container format is another form of describing parsing expression
138 grammars. While data in this format is executable it does not consti‐
139 tute a parser for the grammar. It always has to be used in conjunction
140 with the package pt::peg::interp, a grammar interpreter.
141
142 The format represents grammars by a snit::type, i.e. class, whose
143 instances are API-compatible to the instances of the pt::peg::container
144 package, and which are preloaded with the grammar in question.
145
146 It has no direct formal specification beyond what was said above.
147
148 EXAMPLE
149 Assuming the following PEG for simple mathematical expressions
150
151 PEG calculator (Expression)
152 Digit <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9' ;
153 Sign <- '-' / '+' ;
154 Number <- Sign? Digit+ ;
155 Expression <- Term (AddOp Term)* ;
156 MulOp <- '*' / '/' ;
157 Term <- Factor (MulOp Factor)* ;
158 AddOp <- '+'/'-' ;
159 Factor <- '(' Expression ')' / Number ;
160 END;
161
162
163 one possible CONTAINER serialization for it is
164
165 snit::type a_pe_grammar {
166 constructor {} {
167 install myg using pt::peg::container ${selfns}::G
168 $myg start {n Expression}
169 $myg add AddOp Digit Expression Factor MulOp Number Sign Term
170 $myg modes {
171 AddOp value
172 Digit value
173 Expression value
174 Factor value
175 MulOp value
176 Number value
177 Sign value
178 Term value
179 }
180 $myg rules {
181 AddOp {/ {t -} {t +}}
182 Digit {/ {t 0} {t 1} {t 2} {t 3} {t 4} {t 5} {t 6} {t 7} {t 8} {t 9}}
183 Expression {/ {x {t \50} {n Expression} {t \51}} {x {n Factor} {* {x {n MulOp} {n Factor}}}}}
184 Factor {x {n Term} {* {x {n AddOp} {n Term}}}}
185 MulOp {/ {t *} {t /}}
186 Number {x {? {n Sign}} {+ {n Digit}}}
187 Sign {/ {t -} {t +}}
188 Term {n Number}
189 }
190 return
191 }
192
193 component myg
194 delegate method * to myg
195 }
196
197
199 Here we specify the format used by the Parser Tools to serialize Pars‐
200 ing Expression Grammars as immutable values for transport, comparison,
201 etc.
202
203 We distinguish between regular and canonical serializations. While a
204 PEG may have more than one regular serialization only exactly one of
205 them will be canonical.
206
207 regular serialization
208
209 [1] The serialization of any PEG is a nested Tcl dictionary.
210
211 [2] This dictionary holds a single key, pt::grammar::peg, and
212 its value. This value holds the contents of the grammar.
213
214 [3] The contents of the grammar are a Tcl dictionary holding
215 the set of nonterminal symbols and the starting expres‐
216 sion. The relevant keys and their values are
217
218 rules The value is a Tcl dictionary whose keys are the
219 names of the nonterminal symbols known to the
220 grammar.
221
222 [1] Each nonterminal symbol may occur only
223 once.
224
225 [2] The empty string is not a legal nonterminal
226 symbol.
227
228 [3] The value for each symbol is a Tcl dictio‐
229 nary itself. The relevant keys and their
230 values in this dictionary are
231
232 is The value is the serialization of
233 the parsing expression describing
234 the symbols sentennial structure, as
235 specified in the section PE serial‐
236 ization format.
237
238 mode The value can be one of three values
239 specifying how a parser should han‐
240 dle the semantic value produced by
241 the symbol.
242
243 value The semantic value of the
244 nonterminal symbol is an
245 abstract syntax tree consist‐
246 ing of a single node node for
247 the nonterminal itself, which
248 has the ASTs of the symbol's
249 right hand side as its chil‐
250 dren.
251
252 leaf The semantic value of the
253 nonterminal symbol is an
254 abstract syntax tree consist‐
255 ing of a single node node for
256 the nonterminal, without any
257 children. Any ASTs generated
258 by the symbol's right hand
259 side are discarded.
260
261 void The nonterminal has no seman‐
262 tic value. Any ASTs generated
263 by the symbol's right hand
264 side are discarded (as well).
265
266 start The value is the serialization of the start pars‐
267 ing expression of the grammar, as specified in the
268 section PE serialization format.
269
270 [4] The terminal symbols of the grammar are specified implic‐
271 itly as the set of all terminal symbols used in the start
272 expression and on the RHS of the grammar rules.
273
274 canonical serialization
275 The canonical serialization of a grammar has the format as spec‐
276 ified in the previous item, and then additionally satisfies the
277 constraints below, which make it unique among all the possible
278 serializations of this grammar.
279
280 [1] The keys found in all the nested Tcl dictionaries are
281 sorted in ascending dictionary order, as generated by
282 Tcl's builtin command lsort -increasing -dict.
283
284 [2] The string representation of the value is the canonical
285 representation of a Tcl dictionary. I.e. it does not con‐
286 tain superfluous whitespace.
287
288 EXAMPLE
289 Assuming the following PEG for simple mathematical expressions
290
291 PEG calculator (Expression)
292 Digit <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9' ;
293 Sign <- '-' / '+' ;
294 Number <- Sign? Digit+ ;
295 Expression <- Term (AddOp Term)* ;
296 MulOp <- '*' / '/' ;
297 Term <- Factor (MulOp Factor)* ;
298 AddOp <- '+'/'-' ;
299 Factor <- '(' Expression ')' / Number ;
300 END;
301
302
303 then its canonical serialization (except for whitespace) is
304
305 pt::grammar::peg {
306 rules {
307 AddOp {is {/ {t -} {t +}} mode value}
308 Digit {is {/ {t 0} {t 1} {t 2} {t 3} {t 4} {t 5} {t 6} {t 7} {t 8} {t 9}} mode value}
309 Expression {is {x {n Term} {* {x {n AddOp} {n Term}}}} mode value}
310 Factor {is {/ {x {t (} {n Expression} {t )}} {n Number}} mode value}
311 MulOp {is {/ {t *} {t /}} mode value}
312 Number {is {x {? {n Sign}} {+ {n Digit}}} mode value}
313 Sign {is {/ {t -} {t +}} mode value}
314 Term {is {x {n Factor} {* {x {n MulOp} {n Factor}}}} mode value}
315 }
316 start {n Expression}
317 }
318
319
321 Here we specify the format used by the Parser Tools to serialize Pars‐
322 ing Expressions as immutable values for transport, comparison, etc.
323
324 We distinguish between regular and canonical serializations. While a
325 parsing expression may have more than one regular serialization only
326 exactly one of them will be canonical.
327
328 Regular serialization
329
330 Atomic Parsing Expressions
331
332 [1] The string epsilon is an atomic parsing expres‐
333 sion. It matches the empty string.
334
335 [2] The string dot is an atomic parsing expression. It
336 matches any character.
337
338 [3] The string alnum is an atomic parsing expression.
339 It matches any Unicode alphabet or digit charac‐
340 ter. This is a custom extension of PEs based on
341 Tcl's builtin command string is.
342
343 [4] The string alpha is an atomic parsing expression.
344 It matches any Unicode alphabet character. This is
345 a custom extension of PEs based on Tcl's builtin
346 command string is.
347
348 [5] The string ascii is an atomic parsing expression.
349 It matches any Unicode character below U0080. This
350 is a custom extension of PEs based on Tcl's
351 builtin command string is.
352
353 [6] The string control is an atomic parsing expres‐
354 sion. It matches any Unicode control character.
355 This is a custom extension of PEs based on Tcl's
356 builtin command string is.
357
358 [7] The string digit is an atomic parsing expression.
359 It matches any Unicode digit character. Note that
360 this includes characters outside of the [0..9]
361 range. This is a custom extension of PEs based on
362 Tcl's builtin command string is.
363
364 [8] The string graph is an atomic parsing expression.
365 It matches any Unicode printing character, except
366 for space. This is a custom extension of PEs based
367 on Tcl's builtin command string is.
368
369 [9] The string lower is an atomic parsing expression.
370 It matches any Unicode lower-case alphabet charac‐
371 ter. This is a custom extension of PEs based on
372 Tcl's builtin command string is.
373
374 [10] The string print is an atomic parsing expression.
375 It matches any Unicode printing character, includ‐
376 ing space. This is a custom extension of PEs based
377 on Tcl's builtin command string is.
378
379 [11] The string punct is an atomic parsing expression.
380 It matches any Unicode punctuation character. This
381 is a custom extension of PEs based on Tcl's
382 builtin command string is.
383
384 [12] The string space is an atomic parsing expression.
385 It matches any Unicode space character. This is a
386 custom extension of PEs based on Tcl's builtin
387 command string is.
388
389 [13] The string upper is an atomic parsing expression.
390 It matches any Unicode upper-case alphabet charac‐
391 ter. This is a custom extension of PEs based on
392 Tcl's builtin command string is.
393
394 [14] The string wordchar is an atomic parsing expres‐
395 sion. It matches any Unicode word character. This
396 is any alphanumeric character (see alnum), and any
397 connector punctuation characters (e.g. under‐
398 score). This is a custom extension of PEs based on
399 Tcl's builtin command string is.
400
401 [15] The string xdigit is an atomic parsing expression.
402 It matches any hexadecimal digit character. This
403 is a custom extension of PEs based on Tcl's
404 builtin command string is.
405
406 [16] The string ddigit is an atomic parsing expression.
407 It matches any decimal digit character. This is a
408 custom extension of PEs based on Tcl's builtin
409 command regexp.
410
411 [17] The expression [list t x] is an atomic parsing
412 expression. It matches the terminal string x.
413
414 [18] The expression [list n A] is an atomic parsing
415 expression. It matches the nonterminal A.
416
417 Combined Parsing Expressions
418
419 [1] For parsing expressions e1, e2, ... the result of
420 [list / e1 e2 ... ] is a parsing expression as
421 well. This is the ordered choice, aka prioritized
422 choice.
423
424 [2] For parsing expressions e1, e2, ... the result of
425 [list x e1 e2 ... ] is a parsing expression as
426 well. This is the sequence.
427
428 [3] For a parsing expression e the result of [list *
429 e] is a parsing expression as well. This is the
430 kleene closure, describing zero or more repeti‐
431 tions.
432
433 [4] For a parsing expression e the result of [list +
434 e] is a parsing expression as well. This is the
435 positive kleene closure, describing one or more
436 repetitions.
437
438 [5] For a parsing expression e the result of [list &
439 e] is a parsing expression as well. This is the
440 and lookahead predicate.
441
442 [6] For a parsing expression e the result of [list !
443 e] is a parsing expression as well. This is the
444 not lookahead predicate.
445
446 [7] For a parsing expression e the result of [list ?
447 e] is a parsing expression as well. This is the
448 optional input.
449
450 Canonical serialization
451 The canonical serialization of a parsing expression has the for‐
452 mat as specified in the previous item, and then additionally
453 satisfies the constraints below, which make it unique among all
454 the possible serializations of this parsing expression.
455
456 [1] The string representation of the value is the canonical
457 representation of a pure Tcl list. I.e. it does not con‐
458 tain superfluous whitespace.
459
460 [2] Terminals are not encoded as ranges (where start and end
461 of the range are identical).
462
463 EXAMPLE
464 Assuming the parsing expression shown on the right-hand side of the
465 rule
466
467 Expression <- Term (AddOp Term)*
468
469
470 then its canonical serialization (except for whitespace) is
471
472 {x {n Term} {* {x {n AddOp} {n Term}}}}
473
474
476 This document, and the package it describes, will undoubtedly contain
477 bugs and other problems. Please report such in the category pt of the
478 Tcllib Trackers [http://core.tcl.tk/tcllib/reportlist]. Please also
479 report any ideas for enhancements you may have for either package
480 and/or documentation.
481
482 When proposing code changes, please provide unified diffs, i.e the out‐
483 put of diff -u.
484
485 Note further that attachments are strongly preferred over inlined
486 patches. Attachments can be made by going to the Edit form of the
487 ticket immediately after its creation, and then using the left-most
488 button in the secondary navigation bar.
489
491 CONTAINER, EBNF, LL(k), PEG, TDPL, context-free languages, conversion,
492 expression, format conversion, grammar, matching, parser, parsing
493 expression, parsing expression grammar, push down automaton, recursive
494 descent, serialization, state, top-down parsing languages, transducer
495
497 Parsing and Grammars
498
500 Copyright (c) 2009 Andreas Kupries <andreas_kupries@users.sourceforge.net>
501
502
503
504
505tcllib 1 pt::peg::to::container(n)