1pt::peg::export(n) Parser Tools pt::peg::export(n)
2
3
4
5______________________________________________________________________________
6
8 pt::peg::export - PEG Export
9
11 package require Tcl 8.5
12
13 package require snit
14
15 package require configuration
16
17 package require pt::peg
18
19 package require pluginmgr
20
21 package require pt::peg::export ?1?
22
23 ::pt::peg::export objectName
24
25 objectName method ?arg arg ...?
26
27 objectName destroy
28
29 objectName export serial serial ?format?
30
31 objectName export object object ?format?
32
33 objectName configuration names
34
35 objectName configuration get
36
37 objectName configuration set name ?value?
38
39 objectName configuration unset pattern...
40
41______________________________________________________________________________
42
44 Are you lost ? Do you have trouble understanding this document ? In
45 that case please read the overview provided by the Introduction to
46 Parser Tools. This document is the entrypoint to the whole system the
47 current package is a part of.
48
49 This package provides a manager for parsing expression grammars, with
50 each instance handling a set of plugins for the export of them to other
51 formats, i.e. their conversion to, for example nroff, HTML, etc.
52
53 It resides in the Export section of the Core Layer of Parser Tools, and
54 is one of the three pillars the management of parsing expression gram‐
55 mars resides on.
56
57 IMAGE: arch_core_export
58
59 The other two pillars are, as shown above
60
61 [1] PEG Import, and
62
63 [2] PEG Storage
64
65 For information about the data structure which is the major input to
66 the manager objects provided by this package see the section PEG seri‐
67 alization format.
68
69 The plugin system of this class is based on the package pluginmgr, and
70 configured to look for plugins using
71
72 [1] the environment variable GRAMMAR_PEG_EXPORT_PLUGINS,
73
74 [2] the environment variable GRAMMAR_PEG_PLUGINS,
75
76 [3] the environment variable GRAMMAR_PLUGINS,
77
78 [4] the path "~/.grammar/peg/export/plugin"
79
80 [5] the path "~/.grammar/peg/plugin"
81
82 [6] the path "~/.grammar/plugin"
83
84 [7] the path "~/.grammar/peg/export/plugins"
85
86 [8] the path "~/.grammar/peg/plugins"
87
88 [9] the path "~/.grammar/plugins"
89
90 [10] the registry entry "HKEY_CURRENT_USER\SOFTWARE\GRAM‐
91 MAR\PEG\EXPORT\PLUGINS"
92
93 [11] the registry entry "HKEY_CURRENT_USER\SOFTWARE\GRAMMAR\PEG\PLUG‐
94 INS"
95
96 [12] the registry entry "HKEY_CURRENT_USER\SOFTWARE\GRAMMAR\PLUGINS"
97
98 The last three are used only when the package is run on a machine using
99 the Windows(tm) operating system.
100
101 The whole system is delivered with three predefined export plugins,
102 namely
103
104 container
105 See PEG Export Plugin. To CONTAINER format for details.
106
107 json See PEG Export Plugin. To JSON format for details.
108
109 peg See PEG Export Plugin. To PEG format for details.
110
111 For readers wishing to write their own export plugin for some format,
112 i.e. plugin writers, reading and understanding the Parser Tools Export
113 API specification is an absolute necessity, as it documents the inter‐
114 action between this package and its plugins in detail.
115
117 PACKAGE COMMANDS
118 ::pt::peg::export objectName
119 This command creates a new export manager object with an associ‐
120 ated Tcl command whose name is objectName. This object command
121 is explained in full detail in the sections Object command and
122 Object methods. The object command will be created under the
123 current namespace if the objectName is not fully qualified, and
124 in the specified namespace otherwise.
125
126 OBJECT COMMAND
127 All objects created by the ::pt::peg::export command have the following
128 general form:
129
130 objectName method ?arg arg ...?
131 The method method and its arg'uments determine the exact behav‐
132 ior of the command. See section Object methods for the detailed
133 specifications.
134
135 OBJECT METHODS
136 objectName destroy
137 This method destroys the object it is invoked for.
138
139 objectName export serial serial ?format?
140 This method takes the canonical serialization of a parsing
141 expression grammar stored in serial and converts it to the spec‐
142 ified format, using the export plugin for the format. This will
143 fail with an error if no plugin could be found for the format.
144 The string generated by the conversion process is returned as
145 the result of this method.
146
147 If no format is specified the method defaults to text.
148
149 The specification of what a canonical serialization is can be
150 found in the section PEG serialization format.
151
152 The plugin has to conform to the interface documented in the
153 Parser Tools Export API specification.
154
155 objectName export object object ?format?
156 This method is a convenient wrapper around the export serial
157 method described by the previous item. It expects that object
158 is an object command supporting a serialize method returning the
159 canonical serialization of a parsing expression grammar. It
160 invokes that method, feeds the result into export serial and
161 returns the resulting string as its own result.
162
163 objectName configuration names
164 This method returns a list containing the names of all configu‐
165 ration options currently known to the object.
166
167 objectName configuration get
168 This method returns a dictionary containing the names and values
169 of all configuration options currently known to the object.
170
171 objectName configuration set name ?value?
172 This method sets the configuration option name to the specified
173 value and returns the new value of the option.
174
175 If no value is specified it simply returns the current value,
176 without changing it.
177
178 Note that these configuration options and their values are sim‐
179 ply passed to a plugin when the actual export is performed. It
180 is the plugin which checks the validity, not the manager.
181
182 objectName configuration unset pattern...
183 This method unsets all configuration options matching the speci‐
184 fied glob patterns. If no pattern is specified it will unset all
185 currently defined configuration options.
186
188 Here we specify the format used by the Parser Tools to serialize Pars‐
189 ing Expression Grammars as immutable values for transport, comparison,
190 etc.
191
192 We distinguish between regular and canonical serializations. While a
193 PEG may have more than one regular serialization only exactly one of
194 them will be canonical.
195
196 regular serialization
197
198 [1] The serialization of any PEG is a nested Tcl dictionary.
199
200 [2] This dictionary holds a single key, pt::grammar::peg, and
201 its value. This value holds the contents of the grammar.
202
203 [3] The contents of the grammar are a Tcl dictionary holding
204 the set of nonterminal symbols and the starting expres‐
205 sion. The relevant keys and their values are
206
207 rules The value is a Tcl dictionary whose keys are the
208 names of the nonterminal symbols known to the
209 grammar.
210
211 [1] Each nonterminal symbol may occur only
212 once.
213
214 [2] The empty string is not a legal nonterminal
215 symbol.
216
217 [3] The value for each symbol is a Tcl dictio‐
218 nary itself. The relevant keys and their
219 values in this dictionary are
220
221 is The value is the serialization of
222 the parsing expression describing
223 the symbols sentennial structure, as
224 specified in the section PE serial‐
225 ization format.
226
227 mode The value can be one of three values
228 specifying how a parser should han‐
229 dle the semantic value produced by
230 the symbol.
231
232 value The semantic value of the
233 nonterminal symbol is an
234 abstract syntax tree consist‐
235 ing of a single node node for
236 the nonterminal itself, which
237 has the ASTs of the symbol's
238 right hand side as its chil‐
239 dren.
240
241 leaf The semantic value of the
242 nonterminal symbol is an
243 abstract syntax tree consist‐
244 ing of a single node node for
245 the nonterminal, without any
246 children. Any ASTs generated
247 by the symbol's right hand
248 side are discarded.
249
250 void The nonterminal has no seman‐
251 tic value. Any ASTs generated
252 by the symbol's right hand
253 side are discarded (as well).
254
255 start The value is the serialization of the start pars‐
256 ing expression of the grammar, as specified in the
257 section PE serialization format.
258
259 [4] The terminal symbols of the grammar are specified implic‐
260 itly as the set of all terminal symbols used in the start
261 expression and on the RHS of the grammar rules.
262
263 canonical serialization
264 The canonical serialization of a grammar has the format as spec‐
265 ified in the previous item, and then additionally satisfies the
266 constraints below, which make it unique among all the possible
267 serializations of this grammar.
268
269 [1] The keys found in all the nested Tcl dictionaries are
270 sorted in ascending dictionary order, as generated by
271 Tcl's builtin command lsort -increasing -dict.
272
273 [2] The string representation of the value is the canonical
274 representation of a Tcl dictionary. I.e. it does not con‐
275 tain superfluous whitespace.
276
277 EXAMPLE
278 Assuming the following PEG for simple mathematical expressions
279
280 PEG calculator (Expression)
281 Digit <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9' ;
282 Sign <- '-' / '+' ;
283 Number <- Sign? Digit+ ;
284 Expression <- Term (AddOp Term)* ;
285 MulOp <- '*' / '/' ;
286 Term <- Factor (MulOp Factor)* ;
287 AddOp <- '+'/'-' ;
288 Factor <- '(' Expression ')' / Number ;
289 END;
290
291
292 then its canonical serialization (except for whitespace) is
293
294 pt::grammar::peg {
295 rules {
296 AddOp {is {/ {t -} {t +}} mode value}
297 Digit {is {/ {t 0} {t 1} {t 2} {t 3} {t 4} {t 5} {t 6} {t 7} {t 8} {t 9}} mode value}
298 Expression {is {x {n Term} {* {x {n AddOp} {n Term}}}} mode value}
299 Factor {is {/ {x {t (} {n Expression} {t )}} {n Number}} mode value}
300 MulOp {is {/ {t *} {t /}} mode value}
301 Number {is {x {? {n Sign}} {+ {n Digit}}} mode value}
302 Sign {is {/ {t -} {t +}} mode value}
303 Term {is {x {n Factor} {* {x {n MulOp} {n Factor}}}} mode value}
304 }
305 start {n Expression}
306 }
307
308
310 Here we specify the format used by the Parser Tools to serialize Pars‐
311 ing Expressions as immutable values for transport, comparison, etc.
312
313 We distinguish between regular and canonical serializations. While a
314 parsing expression may have more than one regular serialization only
315 exactly one of them will be canonical.
316
317 Regular serialization
318
319 Atomic Parsing Expressions
320
321 [1] The string epsilon is an atomic parsing expres‐
322 sion. It matches the empty string.
323
324 [2] The string dot is an atomic parsing expression. It
325 matches any character.
326
327 [3] The string alnum is an atomic parsing expression.
328 It matches any Unicode alphabet or digit charac‐
329 ter. This is a custom extension of PEs based on
330 Tcl's builtin command string is.
331
332 [4] The string alpha is an atomic parsing expression.
333 It matches any Unicode alphabet character. This is
334 a custom extension of PEs based on Tcl's builtin
335 command string is.
336
337 [5] The string ascii is an atomic parsing expression.
338 It matches any Unicode character below U0080. This
339 is a custom extension of PEs based on Tcl's
340 builtin command string is.
341
342 [6] The string control is an atomic parsing expres‐
343 sion. It matches any Unicode control character.
344 This is a custom extension of PEs based on Tcl's
345 builtin command string is.
346
347 [7] The string digit is an atomic parsing expression.
348 It matches any Unicode digit character. Note that
349 this includes characters outside of the [0..9]
350 range. This is a custom extension of PEs based on
351 Tcl's builtin command string is.
352
353 [8] The string graph is an atomic parsing expression.
354 It matches any Unicode printing character, except
355 for space. This is a custom extension of PEs based
356 on Tcl's builtin command string is.
357
358 [9] The string lower is an atomic parsing expression.
359 It matches any Unicode lower-case alphabet charac‐
360 ter. This is a custom extension of PEs based on
361 Tcl's builtin command string is.
362
363 [10] The string print is an atomic parsing expression.
364 It matches any Unicode printing character, includ‐
365 ing space. This is a custom extension of PEs based
366 on Tcl's builtin command string is.
367
368 [11] The string punct is an atomic parsing expression.
369 It matches any Unicode punctuation character. This
370 is a custom extension of PEs based on Tcl's
371 builtin command string is.
372
373 [12] The string space is an atomic parsing expression.
374 It matches any Unicode space character. This is a
375 custom extension of PEs based on Tcl's builtin
376 command string is.
377
378 [13] The string upper is an atomic parsing expression.
379 It matches any Unicode upper-case alphabet charac‐
380 ter. This is a custom extension of PEs based on
381 Tcl's builtin command string is.
382
383 [14] The string wordchar is an atomic parsing expres‐
384 sion. It matches any Unicode word character. This
385 is any alphanumeric character (see alnum), and any
386 connector punctuation characters (e.g. under‐
387 score). This is a custom extension of PEs based on
388 Tcl's builtin command string is.
389
390 [15] The string xdigit is an atomic parsing expression.
391 It matches any hexadecimal digit character. This
392 is a custom extension of PEs based on Tcl's
393 builtin command string is.
394
395 [16] The string ddigit is an atomic parsing expression.
396 It matches any decimal digit character. This is a
397 custom extension of PEs based on Tcl's builtin
398 command regexp.
399
400 [17] The expression [list t x] is an atomic parsing
401 expression. It matches the terminal string x.
402
403 [18] The expression [list n A] is an atomic parsing
404 expression. It matches the nonterminal A.
405
406 Combined Parsing Expressions
407
408 [1] For parsing expressions e1, e2, ... the result of
409 [list / e1 e2 ... ] is a parsing expression as
410 well. This is the ordered choice, aka prioritized
411 choice.
412
413 [2] For parsing expressions e1, e2, ... the result of
414 [list x e1 e2 ... ] is a parsing expression as
415 well. This is the sequence.
416
417 [3] For a parsing expression e the result of [list *
418 e] is a parsing expression as well. This is the
419 kleene closure, describing zero or more repeti‐
420 tions.
421
422 [4] For a parsing expression e the result of [list +
423 e] is a parsing expression as well. This is the
424 positive kleene closure, describing one or more
425 repetitions.
426
427 [5] For a parsing expression e the result of [list &
428 e] is a parsing expression as well. This is the
429 and lookahead predicate.
430
431 [6] For a parsing expression e the result of [list !
432 e] is a parsing expression as well. This is the
433 not lookahead predicate.
434
435 [7] For a parsing expression e the result of [list ?
436 e] is a parsing expression as well. This is the
437 optional input.
438
439 Canonical serialization
440 The canonical serialization of a parsing expression has the for‐
441 mat as specified in the previous item, and then additionally
442 satisfies the constraints below, which make it unique among all
443 the possible serializations of this parsing expression.
444
445 [1] The string representation of the value is the canonical
446 representation of a pure Tcl list. I.e. it does not con‐
447 tain superfluous whitespace.
448
449 [2] Terminals are not encoded as ranges (where start and end
450 of the range are identical).
451
452 EXAMPLE
453 Assuming the parsing expression shown on the right-hand side of the
454 rule
455
456 Expression <- Term (AddOp Term)*
457
458
459 then its canonical serialization (except for whitespace) is
460
461 {x {n Term} {* {x {n AddOp} {n Term}}}}
462
463
465 This document, and the package it describes, will undoubtedly contain
466 bugs and other problems. Please report such in the category pt of the
467 Tcllib Trackers [http://core.tcl.tk/tcllib/reportlist]. Please also
468 report any ideas for enhancements you may have for either package
469 and/or documentation.
470
471 When proposing code changes, please provide unified diffs, i.e the out‐
472 put of diff -u.
473
474 Note further that attachments are strongly preferred over inlined
475 patches. Attachments can be made by going to the Edit form of the
476 ticket immediately after its creation, and then using the left-most
477 button in the secondary navigation bar.
478
480 EBNF, LL(k), PEG, TDPL, context-free languages, expression, grammar,
481 matching, parser, parsing expression, parsing expression grammar, push
482 down automaton, recursive descent, state, top-down parsing languages,
483 transducer
484
486 Parsing and Grammars
487
489 Copyright (c) 2009 Andreas Kupries <andreas_kupries@users.sourceforge.net>
490
491
492
493
494tcllib 1 pt::peg::export(n)