1pt::peg::container(n) Parser Tools pt::peg::container(n)
2
3
4
5______________________________________________________________________________
6
8 pt::peg::container - PEG Storage
9
11 package require Tcl 8.5
12
13 package require snit
14
15 package require pt::peg::container ?1?
16
17 ::pt::peg objectName ?=|:=|<--|as|deserialize src?
18
19 objectName destroy
20
21 objectName clear
22
23 objectName importer
24
25 objectName importer object
26
27 objectName exporter
28
29 objectName exporter object
30
31 objectName = source
32
33 objectName --> destination
34
35 objectName serialize ?format?
36
37 objectName deserialize = data ?format?
38
39 objectName deserialize += data ?format?
40
41 objectName start
42
43 objectName start pe
44
45 objectName nonterminals
46
47 objectName modes
48
49 objectName modes dict
50
51 objectName rules
52
53 objectName rules dict
54
55 objectName add ?nt...?
56
57 objectName remove ?nt...?
58
59 objectName exists nt
60
61 objectName rename ntold ntnew
62
63 objectName mode nt
64
65 objectName mode nt mode
66
67 objectName rule nt
68
69 objectName rule nt pe
70
71______________________________________________________________________________
72
74 Are you lost ? Do you have trouble understanding this document ? In
75 that case please read the overview provided by the Introduction to
76 Parser Tools. This document is the entrypoint to the whole system the
77 current package is a part of.
78
79 This package provides a container class for parsing expression gram‐
80 mars, with each instance storing a single grammar and allowing the user
81 to manipulate and query its definition.
82
83 It resides in the Storage section of the Core Layer of Parser Tools,
84 and is one of the three pillars the management of parsing expression
85 grammars resides on.
86
87 IMAGE: arch_core_container
88
89 The other two pillars are, as shown above
90
91 [1] PEG Import, and
92
93 [2] PEG Export
94
95 Packages related to this are:
96
97 pt::rde
98 This package provides an implementation of PARAM, a virtual
99 machine for the parsing of a channel, geared towards the needs
100 of handling PEGs.
101
102 pt::peg::interp
103 This package implements an interpreter for PEGs on top of the
104 virtual machine provided by pt::peg::rde
105
106 CLASS API
107 The package exports the API described here.
108
109 ::pt::peg objectName ?=|:=|<--|as|deserialize src?
110 The command creates a new container object for a parsing expres‐
111 sion grammar and returns the fully qualified name of the object
112 command as its result. The API of this object command is
113 described in the section Object API. It may be used to invoke
114 various operations on the object.
115
116 The new container will be empty if no src is specified. Other‐
117 wise it will contain a copy of the grammar contained in the src.
118 All operators except deserialize interpret src as a container
119 object command. The deserialize operator interprets src as the
120 serialization of a parsing expression grammar instead, as speci‐
121 fied in section PEG serialization format.
122
123 An empty grammar has no nonterminal symbols, and the start
124 expression is the empty expression, i.e. epsilon. It is valid,
125 but not useful.
126
127 OBJECT API
128 All objects created by this package provide the following methods for
129 the manipulation and querying of their contents:
130
131 objectName destroy
132 This method destroys the object, releasing all claimed memory,
133 and deleting the associated object command.
134
135 objectName clear
136 This method resets the object to contain the empty grammar. It
137 does not destroy the object itself.
138
139 objectName importer
140 This method returns the import manager object currently attached
141 to the container, if any.
142
143 objectName importer object
144 This method attaches the object as import manager to the con‐
145 tainer, and returns it as the result of the command. Note that
146 the object is not put into ownership of the container. I.e.,
147 destruction of the container will not destroy the object.
148
149 It is expected that object provides a method named import text
150 which takes a text and a format name, and returns the canonical
151 serialization of the table of contents contained in the text,
152 assuming the given format.
153
154 objectName exporter
155 This method returns the export manager object currently attached
156 to the container, if any.
157
158 objectName exporter object
159 This method attaches the object as export manager to the con‐
160 tainer, and returns it as the result of the command. Note that
161 the object is not put into ownership of the container. I.e.,
162 destruction of the container will not destroy the object.
163
164 It is expected that object provides a method named export object
165 which takes the container and a format name, and returns a text
166 encoding table of contents stored in the container, in the given
167 format. It is further expected that the object will use the con‐
168 tainer's method serialize to obtain the serialization of the ta‐
169 ble of contents from which to generate the text.
170
171 objectName = source
172 This method assigns the contents of the PEG object source to
173 ourselves, overwriting the existing definition. This is the
174 assignment operator for grammars.
175
176 This operation is in effect equivalent to
177
178
179
180 objectName deserialize = [source serialize]
181
182
183 objectName --> destination
184 This method assigns our contents to the PEG object destination,
185 overwriting the existing definition. This is the reverse assign‐
186 ment operator for grammars.
187
188 This operation is in effect equivalent to
189
190
191
192 destination deserialize = [objectName serialize]
193
194
195 objectName serialize ?format?
196 This method returns our grammar in some textual form usable for
197 transfer, persistent storage, etc. If no format is not specified
198 the returned result is the canonical serialization of the gram‐
199 mar, as specified in the section PEG serialization format.
200
201 Otherwise the object will use the attached export manager to
202 convert the data to the specified format. In that case the
203 method will fail with an error if the container has no export
204 manager attached to it.
205
206 objectName deserialize = data ?format?
207 This is the complementary method to serialize. It replaces the
208 current definition with the grammar contained in the data. If no
209 format was specified it is assumed to be the regular serializa‐
210 tion of a grammar, as specified in the section PEG serialization
211 format
212
213 Otherwise the object will use the attached import manager to
214 convert the data from the specified format to a serialization it
215 can handle. In that case the method will fail with an error if
216 the container has no import manager attached to it.
217
218 The result of the method is the empty string.
219
220 objectName deserialize += data ?format?
221 This method behaves like deserialize = in its essentials, except
222 that it merges the grammar in the data to its contents instead
223 of replacing it. The method will fail with an error and leave
224 the grammar unchanged if merging is not possible, i.e. would
225 produce an invalid grammar.
226
227 The result of the method is the empty string.
228
229 objectName start
230 This method returns the current start expression of the grammar.
231
232 objectName start pe
233 This method defines the start expression of the grammar. It
234 replaces the current start expression with the parsing expres‐
235 sion pe, and returns the new start expression.
236
237 The method will fail with an error and leave the grammar
238 unchanged if pe does not contain a valid parsing expression as
239 specified in the section PE serialization format.
240
241 objectName nonterminals
242 This method returns the set of all nonterminal symbols known to
243 the grammar.
244
245 objectName modes
246 This method returns a dictionary mapping the set of all nonter‐
247 minal symbols known to the grammar to their semantic modes.
248
249 objectName modes dict
250 This method takes a dictionary mapping a set of nonterminal sym‐
251 bols known to the grammar to their semantic modes, and returns
252 the new full mapping of nonterminal symbols to semantic modes.
253
254 The method will fail with an error if any of the nonterminal
255 symbols in the dictionary is not known to the grammar, or the
256 empty string, i.e. an invalid nonterminal symbol, or if any the
257 chosen modes is not one of the legal values.
258
259 objectName rules
260 This method returns a dictionary mapping the set of all nonter‐
261 minal symbols known to the grammar to their parsing expressions
262 (right-hand sides).
263
264 objectName rules dict
265 This method takes a dictionary mapping a set of nonterminal sym‐
266 bols known to the grammar to their parsing expressions (right-
267 hand sides), and returns the new full mapping of nonterminal
268 symbols to parsing expressions.
269
270 The method will fail with an error any of the nonterminal sym‐
271 bols in the dictionary is not known to the grammar, or the empty
272 string, i.e. an invalid nonterminal symbol, or any of the chosen
273 parsing expressions is not a valid parsing expression as speci‐
274 fied in the section PE serialization format.
275
276 objectName add ?nt...?
277 This method adds the nonterminal symbols nt, etc. to the gram‐
278 mar, and defines default semantic mode and expression for it
279 (value and epsilon respectively). The method returns the empty
280 string as its result.
281
282 The method will fail with an error and leaves the grammar
283 unchanged if any of the nonterminal symbols are either already
284 defined in our grammar, or are the empty string (an invalid non‐
285 terminal symbol).
286
287 The method does nothing if no symbol was specified as argument.
288
289 objectName remove ?nt...?
290 This method removes the named nonterminal symbols nt, etc. from
291 the set of nonterminal symbols known to our grammar. The method
292 returns the empty string as its result.
293
294 The method will fail with an error and leave the grammar
295 unchanged if any of the nonterminal symbols is not known to the
296 grammar, or is the empty string, i.e. an invalid nonterminal
297 symbol.
298
299 objectName exists nt
300 This method tests whether the nonterminal symbol nt is known to
301 our grammar or not. The result is a boolean value. It will be
302 set to true if nt is known, and false otherwise.
303
304 The method will fail with an error if nt is the empty string,
305 i.e. an invalid nonterminal symbol.
306
307 objectName rename ntold ntnew
308 This method renames the nonterminal symbol ntold to ntnew. The
309 method returns the empty string as its result.
310
311 The method will fail with an error and leave the grammar
312 unchanged if either ntold is not known to the grammar, or ntnew
313 is already known, or any of them is the empty string, i.e. an
314 invalid nonterminal symbol.
315
316 objectName mode nt
317 This method returns the current semantic mode for the nontermi‐
318 nal symbol nt.
319
320 The method will fail with an error if nt is not known to the
321 grammar, or the empty string, i.e. an invalid nonterminal sym‐
322 bol.
323
324 objectName mode nt mode
325 This mode sets the semantic mode for the nonterminal symbol nt,
326 and returns the new mode. The method will fail with an error if
327 nt is not known to the grammar, or the empty string, i.e. an
328 invalid nonterminal symbol, or the chosen mode is not one of the
329 legal values.
330
331 The following modes are legal:
332
333 value The semantic value of the nonterminal symbol is an
334 abstract syntax tree consisting of a single node node for
335 the nonterminal itself, which has the ASTs of the sym‐
336 bol's right hand side as its children.
337
338 leaf The semantic value of the nonterminal symbol is an
339 abstract syntax tree consisting of a single node node for
340 the nonterminal, without any children. Any ASTs generated
341 by the symbol's right hand side are discarded.
342
343 void The nonterminal has no semantic value. Any ASTs generated
344 by the symbol's right hand side are discarded (as well).
345
346 objectName rule nt
347 This method returns the current parsing expression (right-hand
348 side) for the nonterminal symbol nt.
349
350 The method will fail with an error if nt is not known to the
351 grammar, or the empty string, i.e. an invalid nonterminal sym‐
352 bol.
353
354 objectName rule nt pe
355 This method set the parsing expression (right-hand side) of the
356 nonterminal nt to pe, and returns the new parsing expression.
357
358 The method will fail with an error if nt is not known to the
359 grammar, or the empty string, i.e. an invalid nonterminal sym‐
360 bol, or pe does not contain a valid parsing expression as speci‐
361 fied in the section PE serialization format.
362
364 Here we specify the format used by the Parser Tools to serialize Pars‐
365 ing Expression Grammars as immutable values for transport, comparison,
366 etc.
367
368 We distinguish between regular and canonical serializations. While a
369 PEG may have more than one regular serialization only exactly one of
370 them will be canonical.
371
372 regular serialization
373
374 [1] The serialization of any PEG is a nested Tcl dictionary.
375
376 [2] This dictionary holds a single key, pt::grammar::peg, and
377 its value. This value holds the contents of the grammar.
378
379 [3] The contents of the grammar are a Tcl dictionary holding
380 the set of nonterminal symbols and the starting expres‐
381 sion. The relevant keys and their values are
382
383 rules The value is a Tcl dictionary whose keys are the
384 names of the nonterminal symbols known to the
385 grammar.
386
387 [1] Each nonterminal symbol may occur only
388 once.
389
390 [2] The empty string is not a legal nonterminal
391 symbol.
392
393 [3] The value for each symbol is a Tcl dictio‐
394 nary itself. The relevant keys and their
395 values in this dictionary are
396
397 is The value is the serialization of
398 the parsing expression describing
399 the symbols sentennial structure, as
400 specified in the section PE serial‐
401 ization format.
402
403 mode The value can be one of three values
404 specifying how a parser should han‐
405 dle the semantic value produced by
406 the symbol.
407
408 value The semantic value of the
409 nonterminal symbol is an
410 abstract syntax tree consist‐
411 ing of a single node node for
412 the nonterminal itself, which
413 has the ASTs of the symbol's
414 right hand side as its chil‐
415 dren.
416
417 leaf The semantic value of the
418 nonterminal symbol is an
419 abstract syntax tree consist‐
420 ing of a single node node for
421 the nonterminal, without any
422 children. Any ASTs generated
423 by the symbol's right hand
424 side are discarded.
425
426 void The nonterminal has no seman‐
427 tic value. Any ASTs generated
428 by the symbol's right hand
429 side are discarded (as well).
430
431 start The value is the serialization of the start pars‐
432 ing expression of the grammar, as specified in the
433 section PE serialization format.
434
435 [4] The terminal symbols of the grammar are specified implic‐
436 itly as the set of all terminal symbols used in the start
437 expression and on the RHS of the grammar rules.
438
439 canonical serialization
440 The canonical serialization of a grammar has the format as spec‐
441 ified in the previous item, and then additionally satisfies the
442 constraints below, which make it unique among all the possible
443 serializations of this grammar.
444
445 [1] The keys found in all the nested Tcl dictionaries are
446 sorted in ascending dictionary order, as generated by
447 Tcl's builtin command lsort -increasing -dict.
448
449 [2] The string representation of the value is the canonical
450 representation of a Tcl dictionary. I.e. it does not con‐
451 tain superfluous whitespace.
452
453 EXAMPLE
454 Assuming the following PEG for simple mathematical expressions
455
456 PEG calculator (Expression)
457 Digit <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9' ;
458 Sign <- '-' / '+' ;
459 Number <- Sign? Digit+ ;
460 Expression <- Term (AddOp Term)* ;
461 MulOp <- '*' / '/' ;
462 Term <- Factor (MulOp Factor)* ;
463 AddOp <- '+'/'-' ;
464 Factor <- '(' Expression ')' / Number ;
465 END;
466
467
468 then its canonical serialization (except for whitespace) is
469
470 pt::grammar::peg {
471 rules {
472 AddOp {is {/ {t -} {t +}} mode value}
473 Digit {is {/ {t 0} {t 1} {t 2} {t 3} {t 4} {t 5} {t 6} {t 7} {t 8} {t 9}} mode value}
474 Expression {is {x {n Term} {* {x {n AddOp} {n Term}}}} mode value}
475 Factor {is {/ {x {t (} {n Expression} {t )}} {n Number}} mode value}
476 MulOp {is {/ {t *} {t /}} mode value}
477 Number {is {x {? {n Sign}} {+ {n Digit}}} mode value}
478 Sign {is {/ {t -} {t +}} mode value}
479 Term {is {x {n Factor} {* {x {n MulOp} {n Factor}}}} mode value}
480 }
481 start {n Expression}
482 }
483
484
486 Here we specify the format used by the Parser Tools to serialize Pars‐
487 ing Expressions as immutable values for transport, comparison, etc.
488
489 We distinguish between regular and canonical serializations. While a
490 parsing expression may have more than one regular serialization only
491 exactly one of them will be canonical.
492
493 Regular serialization
494
495 Atomic Parsing Expressions
496
497 [1] The string epsilon is an atomic parsing expres‐
498 sion. It matches the empty string.
499
500 [2] The string dot is an atomic parsing expression. It
501 matches any character.
502
503 [3] The string alnum is an atomic parsing expression.
504 It matches any Unicode alphabet or digit charac‐
505 ter. This is a custom extension of PEs based on
506 Tcl's builtin command string is.
507
508 [4] The string alpha is an atomic parsing expression.
509 It matches any Unicode alphabet character. This is
510 a custom extension of PEs based on Tcl's builtin
511 command string is.
512
513 [5] The string ascii is an atomic parsing expression.
514 It matches any Unicode character below U0080. This
515 is a custom extension of PEs based on Tcl's
516 builtin command string is.
517
518 [6] The string control is an atomic parsing expres‐
519 sion. It matches any Unicode control character.
520 This is a custom extension of PEs based on Tcl's
521 builtin command string is.
522
523 [7] The string digit is an atomic parsing expression.
524 It matches any Unicode digit character. Note that
525 this includes characters outside of the [0..9]
526 range. This is a custom extension of PEs based on
527 Tcl's builtin command string is.
528
529 [8] The string graph is an atomic parsing expression.
530 It matches any Unicode printing character, except
531 for space. This is a custom extension of PEs based
532 on Tcl's builtin command string is.
533
534 [9] The string lower is an atomic parsing expression.
535 It matches any Unicode lower-case alphabet charac‐
536 ter. This is a custom extension of PEs based on
537 Tcl's builtin command string is.
538
539 [10] The string print is an atomic parsing expression.
540 It matches any Unicode printing character, includ‐
541 ing space. This is a custom extension of PEs based
542 on Tcl's builtin command string is.
543
544 [11] The string punct is an atomic parsing expression.
545 It matches any Unicode punctuation character. This
546 is a custom extension of PEs based on Tcl's
547 builtin command string is.
548
549 [12] The string space is an atomic parsing expression.
550 It matches any Unicode space character. This is a
551 custom extension of PEs based on Tcl's builtin
552 command string is.
553
554 [13] The string upper is an atomic parsing expression.
555 It matches any Unicode upper-case alphabet charac‐
556 ter. This is a custom extension of PEs based on
557 Tcl's builtin command string is.
558
559 [14] The string wordchar is an atomic parsing expres‐
560 sion. It matches any Unicode word character. This
561 is any alphanumeric character (see alnum), and any
562 connector punctuation characters (e.g. under‐
563 score). This is a custom extension of PEs based on
564 Tcl's builtin command string is.
565
566 [15] The string xdigit is an atomic parsing expression.
567 It matches any hexadecimal digit character. This
568 is a custom extension of PEs based on Tcl's
569 builtin command string is.
570
571 [16] The string ddigit is an atomic parsing expression.
572 It matches any decimal digit character. This is a
573 custom extension of PEs based on Tcl's builtin
574 command regexp.
575
576 [17] The expression [list t x] is an atomic parsing
577 expression. It matches the terminal string x.
578
579 [18] The expression [list n A] is an atomic parsing
580 expression. It matches the nonterminal A.
581
582 Combined Parsing Expressions
583
584 [1] For parsing expressions e1, e2, ... the result of
585 [list / e1 e2 ... ] is a parsing expression as
586 well. This is the ordered choice, aka prioritized
587 choice.
588
589 [2] For parsing expressions e1, e2, ... the result of
590 [list x e1 e2 ... ] is a parsing expression as
591 well. This is the sequence.
592
593 [3] For a parsing expression e the result of [list *
594 e] is a parsing expression as well. This is the
595 kleene closure, describing zero or more repeti‐
596 tions.
597
598 [4] For a parsing expression e the result of [list +
599 e] is a parsing expression as well. This is the
600 positive kleene closure, describing one or more
601 repetitions.
602
603 [5] For a parsing expression e the result of [list &
604 e] is a parsing expression as well. This is the
605 and lookahead predicate.
606
607 [6] For a parsing expression e the result of [list !
608 e] is a parsing expression as well. This is the
609 not lookahead predicate.
610
611 [7] For a parsing expression e the result of [list ?
612 e] is a parsing expression as well. This is the
613 optional input.
614
615 Canonical serialization
616 The canonical serialization of a parsing expression has the for‐
617 mat as specified in the previous item, and then additionally
618 satisfies the constraints below, which make it unique among all
619 the possible serializations of this parsing expression.
620
621 [1] The string representation of the value is the canonical
622 representation of a pure Tcl list. I.e. it does not con‐
623 tain superfluous whitespace.
624
625 [2] Terminals are not encoded as ranges (where start and end
626 of the range are identical).
627
628 EXAMPLE
629 Assuming the parsing expression shown on the right-hand side of the
630 rule
631
632 Expression <- Term (AddOp Term)*
633
634
635 then its canonical serialization (except for whitespace) is
636
637 {x {n Term} {* {x {n AddOp} {n Term}}}}
638
639
641 This document, and the package it describes, will undoubtedly contain
642 bugs and other problems. Please report such in the category pt of the
643 Tcllib Trackers [http://core.tcl.tk/tcllib/reportlist]. Please also
644 report any ideas for enhancements you may have for either package
645 and/or documentation.
646
647 When proposing code changes, please provide unified diffs, i.e the out‐
648 put of diff -u.
649
650 Note further that attachments are strongly preferred over inlined
651 patches. Attachments can be made by going to the Edit form of the
652 ticket immediately after its creation, and then using the left-most
653 button in the secondary navigation bar.
654
656 EBNF, LL(k), PEG, TDPL, context-free languages, expression, grammar,
657 matching, parser, parsing expression, parsing expression grammar, push
658 down automaton, recursive descent, state, top-down parsing languages,
659 transducer
660
662 Parsing and Grammars
663
665 Copyright (c) 2009 Andreas Kupries <andreas_kupries@users.sourceforge.net>
666
667
668
669
670tcllib 1 pt::peg::container(n)