1pt_export_api(i) Parser Tools pt_export_api(i)
2
3
4
5______________________________________________________________________________
6
8 pt_export_api - Parser Tools Export API
9
11 package require Tcl 8.5
12
13 CONVERTER reset
14
15 CONVERTER configure
16
17 CONVERTER configure option
18
19 CONVERTER configure option value...
20
21 CONVERTER convert serial
22
23 ::export serial configuration
24
25______________________________________________________________________________
26
28 Are you lost ? Do you have trouble understanding this document ? In
29 that case please read the overview provided by the Introduction to
30 Parser Tools. This document is the entrypoint to the whole system the
31 current package is a part of.
32
33 This document describes two APIs. First the API shared by all packages
34 for the conversion of Parsing Expression Grammars into some other for‐
35 mat, and then the API shared by the packages which implement the export
36 plugins sitting on top of the conversion packages.
37
38 Its intended audience are people who wish to create their own converter
39 for some type of output, and/or an export plugin for their or some
40 other converter.
41
42 It resides in the Export section of the Core Layer of Parser Tools.
43
44 IMAGE: arch_core_export
45
47 Any (grammar) export converter has to follow the rules set out below:
48
49 [1] A converter is a package. Its name is arbitrary, however it is
50 recommended to put it under the ::pt::peg::to namespace.
51
52 [2] The package provides either a single Tcl command following the
53 API outlined below, or a class command whose instances follow
54 the same API. The commands which follow the API are called con‐
55 verter commands.
56
57 [3] A converter command has to provide the following three methods
58 with the given signatures and semantics. Converter commands are
59 allowed to provide more methods of their own, but not less, and
60 they may not provide different semantics for the standardized
61 methods.
62
63 CONVERTER reset
64 This method has to reset the configuration of the con‐
65 verter to its default settings. The result of the method
66 has to be the empty string.
67
68 CONVERTER configure
69 This method, in this form, has to return a dictionary
70 containing the current configuration of the converter.
71
72 CONVERTER configure option
73 This method, in this form, has to return the current
74 value of the specified configuration option of the con‐
75 verter.
76
77 Please read the section Options for the set of standard
78 options any converter has to accept. Any other options
79 accepted by a specific converter will be described in its
80 manpage.
81
82 CONVERTER configure option value...
83 This command, in this form, sets the specified options of
84 the converter to the given values.
85
86 Please read the section Options for the set of standard
87 options a converter has to accept. Any other options ac‐
88 cepted by a specific converter will be described in its
89 manpage.
90
91 CONVERTER convert serial
92 This method has to accept the canonical serialization of
93 a parsing expression grammar, as specified in section PEG
94 serialization format, and contained in serial. The re‐
95 sult of the method has to be the result of converting the
96 input grammar into whatever the converter is for, per its
97 configuration.
98
100 Any (grammar) export plugin has to follow the rules set out below:
101
102 [1] A plugin is a package.
103
104 [2] The name of a plugin package has the form pt::peg::export::FOO,
105 where FOO is the name of the format the plugin will generate
106 output for.
107
108 [3] The plugin can expect that the package pt::peg::export::plugin
109 is present, as indicator that it was invoked from a genuine
110 plugin manager.
111
112 It is recommended that a plugin does check for the presence of
113 this package.
114
115 [4] A plugin has to provide a single command, in the global name‐
116 space, with the signature shown below. Plugins are allowed to
117 provide more command of their own, but not less, and they may
118 not provide different semantics for the standardized command.
119
120 ::export serial configuration
121 This command has to accept the canonical serialization of
122 a parsing expression grammar and the configuration for
123 the converter invoked by the plugin. The result of the
124 command has to be the result of the converter invoked by
125 the plugin for th input grammar and configuration.
126
127 string serial
128 This argument will contain the canonical serial‐
129 ization of the parsing expression grammar for
130 which to generate the output. The specification
131 of what a canonical serialization is can be found
132 in the section PEG serialization format.
133
134 dictionary configuration
135 This argument will contain the configuration to
136 configure the converter with before invoking it,
137 as a dictionary mapping from options to values.
138
139 Please read the section Options for the set of
140 standard options any converter has to accept, and
141 thus any plugin as well. Any other options ac‐
142 cepted by a specific plugin will be described in
143 its manpage.
144
145 [5] A single usage cycle of a plugin consists of an invokation of
146 the command export. This call has to leave the plugin in a state
147 where another usage cycle can be run without problems.
148
150 Each export converter and plugin for an export converter has to accept
151 the options below in their configure method. Converters are allowed to
152 ignore the contents of these options when performing a conversion, but
153 they must not reject them. Plugins are expected to pass the options
154 given to them to the converter they are invoking.
155
156 -file string
157 The value of this option is the name of the file or other entity
158 from which the grammar came, for which the command is run. The
159 default value is unknown.
160
161 -name string
162 The value of this option is the name of the grammar we are pro‐
163 cessing. The default value is a_pe_grammar.
164
165 -user string
166 The value of this option is the name of the user for which the
167 command is run. The default value is unknown.
168
170 To use a converter do
171
172
173 # Get the converter (single command here, not class)
174 package require the-converter-package
175
176 # Provide a configuration
177 theconverter configure ...
178
179 # Perform the conversion
180 set result [theconverter convert $thegrammarserial]
181
182 ... process the result ...
183
184 To use a plugin FOO do
185
186
187 # Get an export plugin manager
188 package require pt::peg::export
189 pt::peg::export E
190
191 # Provide a configuration
192 E configuration set ...
193
194 # Run the plugin, and the converter inside.
195 set result [E export serial $grammarserial FOO]
196
197 ... process the result ...
198
199
201 Here we specify the format used by the Parser Tools to serialize Pars‐
202 ing Expression Grammars as immutable values for transport, comparison,
203 etc.
204
205 We distinguish between regular and canonical serializations. While a
206 PEG may have more than one regular serialization only exactly one of
207 them will be canonical.
208
209 regular serialization
210
211 [1] The serialization of any PEG is a nested Tcl dictionary.
212
213 [2] This dictionary holds a single key, pt::grammar::peg, and
214 its value. This value holds the contents of the grammar.
215
216 [3] The contents of the grammar are a Tcl dictionary holding
217 the set of nonterminal symbols and the starting expres‐
218 sion. The relevant keys and their values are
219
220 rules The value is a Tcl dictionary whose keys are the
221 names of the nonterminal symbols known to the
222 grammar.
223
224 [1] Each nonterminal symbol may occur only
225 once.
226
227 [2] The empty string is not a legal nonterminal
228 symbol.
229
230 [3] The value for each symbol is a Tcl dictio‐
231 nary itself. The relevant keys and their
232 values in this dictionary are
233
234 is The value is the serialization of
235 the parsing expression describing
236 the symbols sentennial structure, as
237 specified in the section PE serial‐
238 ization format.
239
240 mode The value can be one of three values
241 specifying how a parser should han‐
242 dle the semantic value produced by
243 the symbol.
244
245 value The semantic value of the
246 nonterminal symbol is an ab‐
247 stract syntax tree consisting
248 of a single node node for the
249 nonterminal itself, which has
250 the ASTs of the symbol's
251 right hand side as its chil‐
252 dren.
253
254 leaf The semantic value of the
255 nonterminal symbol is an ab‐
256 stract syntax tree consisting
257 of a single node node for the
258 nonterminal, without any
259 children. Any ASTs generated
260 by the symbol's right hand
261 side are discarded.
262
263 void The nonterminal has no seman‐
264 tic value. Any ASTs generated
265 by the symbol's right hand
266 side are discarded (as well).
267
268 start The value is the serialization of the start pars‐
269 ing expression of the grammar, as specified in the
270 section PE serialization format.
271
272 [4] The terminal symbols of the grammar are specified implic‐
273 itly as the set of all terminal symbols used in the start
274 expression and on the RHS of the grammar rules.
275
276 canonical serialization
277 The canonical serialization of a grammar has the format as spec‐
278 ified in the previous item, and then additionally satisfies the
279 constraints below, which make it unique among all the possible
280 serializations of this grammar.
281
282 [1] The keys found in all the nested Tcl dictionaries are
283 sorted in ascending dictionary order, as generated by
284 Tcl's builtin command lsort -increasing -dict.
285
286 [2] The string representation of the value is the canonical
287 representation of a Tcl dictionary. I.e. it does not con‐
288 tain superfluous whitespace.
289
290 EXAMPLE
291 Assuming the following PEG for simple mathematical expressions
292
293 PEG calculator (Expression)
294 Digit <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9' ;
295 Sign <- '-' / '+' ;
296 Number <- Sign? Digit+ ;
297 Expression <- Term (AddOp Term)* ;
298 MulOp <- '*' / '/' ;
299 Term <- Factor (MulOp Factor)* ;
300 AddOp <- '+'/'-' ;
301 Factor <- '(' Expression ')' / Number ;
302 END;
303
304
305 then its canonical serialization (except for whitespace) is
306
307 pt::grammar::peg {
308 rules {
309 AddOp {is {/ {t -} {t +}} mode value}
310 Digit {is {/ {t 0} {t 1} {t 2} {t 3} {t 4} {t 5} {t 6} {t 7} {t 8} {t 9}} mode value}
311 Expression {is {x {n Term} {* {x {n AddOp} {n Term}}}} mode value}
312 Factor {is {/ {x {t (} {n Expression} {t )}} {n Number}} mode value}
313 MulOp {is {/ {t *} {t /}} mode value}
314 Number {is {x {? {n Sign}} {+ {n Digit}}} mode value}
315 Sign {is {/ {t -} {t +}} mode value}
316 Term {is {x {n Factor} {* {x {n MulOp} {n Factor}}}} mode value}
317 }
318 start {n Expression}
319 }
320
321
323 Here we specify the format used by the Parser Tools to serialize Pars‐
324 ing Expressions as immutable values for transport, comparison, etc.
325
326 We distinguish between regular and canonical serializations. While a
327 parsing expression may have more than one regular serialization only
328 exactly one of them will be canonical.
329
330 Regular serialization
331
332 Atomic Parsing Expressions
333
334 [1] The string epsilon is an atomic parsing expres‐
335 sion. It matches the empty string.
336
337 [2] The string dot is an atomic parsing expression. It
338 matches any character.
339
340 [3] The string alnum is an atomic parsing expression.
341 It matches any Unicode alphabet or digit charac‐
342 ter. This is a custom extension of PEs based on
343 Tcl's builtin command string is.
344
345 [4] The string alpha is an atomic parsing expression.
346 It matches any Unicode alphabet character. This is
347 a custom extension of PEs based on Tcl's builtin
348 command string is.
349
350 [5] The string ascii is an atomic parsing expression.
351 It matches any Unicode character below U0080. This
352 is a custom extension of PEs based on Tcl's
353 builtin command string is.
354
355 [6] The string control is an atomic parsing expres‐
356 sion. It matches any Unicode control character.
357 This is a custom extension of PEs based on Tcl's
358 builtin command string is.
359
360 [7] The string digit is an atomic parsing expression.
361 It matches any Unicode digit character. Note that
362 this includes characters outside of the [0..9]
363 range. This is a custom extension of PEs based on
364 Tcl's builtin command string is.
365
366 [8] The string graph is an atomic parsing expression.
367 It matches any Unicode printing character, except
368 for space. This is a custom extension of PEs based
369 on Tcl's builtin command string is.
370
371 [9] The string lower is an atomic parsing expression.
372 It matches any Unicode lower-case alphabet charac‐
373 ter. This is a custom extension of PEs based on
374 Tcl's builtin command string is.
375
376 [10] The string print is an atomic parsing expression.
377 It matches any Unicode printing character, includ‐
378 ing space. This is a custom extension of PEs based
379 on Tcl's builtin command string is.
380
381 [11] The string punct is an atomic parsing expression.
382 It matches any Unicode punctuation character. This
383 is a custom extension of PEs based on Tcl's
384 builtin command string is.
385
386 [12] The string space is an atomic parsing expression.
387 It matches any Unicode space character. This is a
388 custom extension of PEs based on Tcl's builtin
389 command string is.
390
391 [13] The string upper is an atomic parsing expression.
392 It matches any Unicode upper-case alphabet charac‐
393 ter. This is a custom extension of PEs based on
394 Tcl's builtin command string is.
395
396 [14] The string wordchar is an atomic parsing expres‐
397 sion. It matches any Unicode word character. This
398 is any alphanumeric character (see alnum), and any
399 connector punctuation characters (e.g. under‐
400 score). This is a custom extension of PEs based on
401 Tcl's builtin command string is.
402
403 [15] The string xdigit is an atomic parsing expression.
404 It matches any hexadecimal digit character. This
405 is a custom extension of PEs based on Tcl's
406 builtin command string is.
407
408 [16] The string ddigit is an atomic parsing expression.
409 It matches any decimal digit character. This is a
410 custom extension of PEs based on Tcl's builtin
411 command regexp.
412
413 [17] The expression [list t x] is an atomic parsing ex‐
414 pression. It matches the terminal string x.
415
416 [18] The expression [list n A] is an atomic parsing ex‐
417 pression. It matches the nonterminal A.
418
419 Combined Parsing Expressions
420
421 [1] For parsing expressions e1, e2, ... the result of
422 [list / e1 e2 ... ] is a parsing expression as
423 well. This is the ordered choice, aka prioritized
424 choice.
425
426 [2] For parsing expressions e1, e2, ... the result of
427 [list x e1 e2 ... ] is a parsing expression as
428 well. This is the sequence.
429
430 [3] For a parsing expression e the result of [list *
431 e] is a parsing expression as well. This is the
432 kleene closure, describing zero or more repeti‐
433 tions.
434
435 [4] For a parsing expression e the result of [list +
436 e] is a parsing expression as well. This is the
437 positive kleene closure, describing one or more
438 repetitions.
439
440 [5] For a parsing expression e the result of [list &
441 e] is a parsing expression as well. This is the
442 and lookahead predicate.
443
444 [6] For a parsing expression e the result of [list !
445 e] is a parsing expression as well. This is the
446 not lookahead predicate.
447
448 [7] For a parsing expression e the result of [list ?
449 e] is a parsing expression as well. This is the
450 optional input.
451
452 Canonical serialization
453 The canonical serialization of a parsing expression has the for‐
454 mat as specified in the previous item, and then additionally
455 satisfies the constraints below, which make it unique among all
456 the possible serializations of this parsing expression.
457
458 [1] The string representation of the value is the canonical
459 representation of a pure Tcl list. I.e. it does not con‐
460 tain superfluous whitespace.
461
462 [2] Terminals are not encoded as ranges (where start and end
463 of the range are identical).
464
465 EXAMPLE
466 Assuming the parsing expression shown on the right-hand side of the
467 rule
468
469 Expression <- Term (AddOp Term)*
470
471
472 then its canonical serialization (except for whitespace) is
473
474 {x {n Term} {* {x {n AddOp} {n Term}}}}
475
476
478 This document, and the package it describes, will undoubtedly contain
479 bugs and other problems. Please report such in the category pt of the
480 Tcllib Trackers [http://core.tcl.tk/tcllib/reportlist]. Please also
481 report any ideas for enhancements you may have for either package
482 and/or documentation.
483
484 When proposing code changes, please provide unified diffs, i.e the out‐
485 put of diff -u.
486
487 Note further that attachments are strongly preferred over inlined
488 patches. Attachments can be made by going to the Edit form of the
489 ticket immediately after its creation, and then using the left-most
490 button in the secondary navigation bar.
491
493 EBNF, LL(k), PEG, TDPL, context-free languages, expression, grammar,
494 matching, parser, parsing expression, parsing expression grammar, push
495 down automaton, recursive descent, state, top-down parsing languages,
496 transducer
497
499 Parsing and Grammars
500
502 Copyright (c) 2009 Andreas Kupries <andreas_kupries@users.sourceforge.net>
503
504
505
506
507tcllib 1 pt_export_api(i)