1pt_import_api(i) Parser Tools pt_import_api(i)
2
3
4
5______________________________________________________________________________
6
8 pt_import_api - Parser Tools Import API
9
11 package require Tcl 8.5
12
13 CONVERTER convert text
14
15 IncludeFile currentfile path
16
17 ::import text
18
19______________________________________________________________________________
20
22 Are you lost ? Do you have trouble understanding this document ? In
23 that case please read the overview provided by the Introduction to
24 Parser Tools. This document is the entrypoint to the whole system the
25 current package is a part of.
26
27 This document describes two APIs. First the API shared by all packages
28 for the conversion of some other format into Parsing Expression Gram‐
29 mars , and then the API shared by the packages which implement the im‐
30 port plugins sitting on top of the conversion packages.
31
32 Its intended audience are people who wish to create their own converter
33 for some type of input, and/or an import plugin for their or some other
34 converter.
35
36 It resides in the Import section of the Core Layer of Parser Tools.
37
38 IMAGE: arch_core_import
39
41 Any (grammar) import converter has to follow the rules set out below:
42
43 [1] A converter is a package. Its name is arbitrary, however it is
44 recommended to put it under the ::pt::peg::from namespace.
45
46 [2] The package provides either a single Tcl command following the
47 API outlined below, or a class command whose instances follow
48 the same API. The commands which follow the API are called con‐
49 verter commands.
50
51 [3] A converter command has to provide the following single method
52 with the given signature and semantic. Converter commands are
53 allowed to provide more methods of their own, but not less, and
54 they may not provide different semantics for the standardized
55 method.
56
57 CONVERTER convert text
58 This method has to accept some text, a parsing expression
59 grammar in some format. The result of the method has to
60 be the canonical serialization of a parsing expression
61 grammar, as specified in section PEG serialization for‐
62 mat, the result of reading and converting the input text.
63
65 Any (grammar) import plugin has to follow the rules set out below:
66
67 [1] A plugin is a package.
68
69 [2] The name of a plugin package has the form pt::peg::import::FOO,
70 where FOO is the name of the format the plugin will accept input
71 for.
72
73 [3] The plugin can expect that the package pt::peg::import::plugin
74 is present, as indicator that it was invoked from a genuine
75 plugin manager.
76
77 It is recommended that a plugin does check for the presence of
78 this package.
79
80 [4] The plugin can expect that a command named IncludeFile is
81 present, with the signature
82
83 IncludeFile currentfile path
84 This command has to be invoked by the plugin when it has
85 to process an included file, if the format has the con‐
86 cept of such.
87
88 The plugin has to supply the following arguments
89
90 string currentfile
91 The path of the file it is currently processing.
92 This may be the empty string if no such is known.
93
94 string path
95 The path of the include file as specified in the
96 include directive being processed.
97
98 The result of the command will be a 5-element list con‐
99 taining
100
101 [1] A boolean flag indicating the success (True) or
102 failure (False) of the operation.
103
104 [2] In case of success the contents of the included
105 file, and the empty string otherwise.
106
107 [3] The resolved, i.e. absolute path of the included
108 file, if possible, or the unchanged path argument.
109 This is for display in an error message, or as the
110 currentfile argument of another call to Include‐
111 File should this file contain more files.
112
113 [4] In case of success an empty string, and for fail‐
114 ure a code indicating the reason for it, one of
115
116 notfound
117 The specified file could not be found.
118
119 notread
120 The specified file was found, but not be
121 read into memory.
122
123 [5] An empty string in case of success of a notfound
124 failure, and an additional error message describ‐
125 ing the reason for a notread error in more detail.
126
127 [5] A plugin has to provide a single command, in the global name‐
128 space, with the signature shown below. Plugins are allowed to
129 provide more commands of their own, but not less, and they may
130 not provide different semantics for the standardized command.
131
132 ::import text
133 This command has to accept the a text containing a pars‐
134 ing expression grammar in some format. The result of the
135 command has to be the result of the converter invoked by
136 the plugin for the input grammar, the canonical serial‐
137 ization of the parsing expression grammar contained in
138 the input.
139
140 string text
141 This argument will contain the parsing expression
142 grammar for which to generate the serialization.
143 The specification of what a canonical serializa‐
144 tion is can be found in the section PEG serializa‐
145 tion format.
146
147 [6] A single usage cycle of a plugin consists of an invokation of
148 the command import. This call has to leave the plugin in a state
149 where another usage cycle can be run without problems.
150
152 To use a converter do
153
154
155 # Get the converter (single command here, not class)
156 package require the-converter-package
157
158 # Perform the conversion
159 set serial [theconverter convert $thegrammartext]
160
161 ... process the result ...
162
163 To use a plugin FOO do
164
165
166 # Get an import plugin manager
167 package require pt::peg::import
168 pt::peg::import I
169
170 # Run the plugin, and the converter inside.
171 set serial [I import serial $thegrammartext FOO]
172
173 ... process the result ...
174
175
177 Here we specify the format used by the Parser Tools to serialize Pars‐
178 ing Expression Grammars as immutable values for transport, comparison,
179 etc.
180
181 We distinguish between regular and canonical serializations. While a
182 PEG may have more than one regular serialization only exactly one of
183 them will be canonical.
184
185 regular serialization
186
187 [1] The serialization of any PEG is a nested Tcl dictionary.
188
189 [2] This dictionary holds a single key, pt::grammar::peg, and
190 its value. This value holds the contents of the grammar.
191
192 [3] The contents of the grammar are a Tcl dictionary holding
193 the set of nonterminal symbols and the starting expres‐
194 sion. The relevant keys and their values are
195
196 rules The value is a Tcl dictionary whose keys are the
197 names of the nonterminal symbols known to the
198 grammar.
199
200 [1] Each nonterminal symbol may occur only
201 once.
202
203 [2] The empty string is not a legal nonterminal
204 symbol.
205
206 [3] The value for each symbol is a Tcl dictio‐
207 nary itself. The relevant keys and their
208 values in this dictionary are
209
210 is The value is the serialization of
211 the parsing expression describing
212 the symbols sentennial structure, as
213 specified in the section PE serial‐
214 ization format.
215
216 mode The value can be one of three values
217 specifying how a parser should han‐
218 dle the semantic value produced by
219 the symbol.
220
221 value The semantic value of the
222 nonterminal symbol is an ab‐
223 stract syntax tree consisting
224 of a single node node for the
225 nonterminal itself, which has
226 the ASTs of the symbol's
227 right hand side as its chil‐
228 dren.
229
230 leaf The semantic value of the
231 nonterminal symbol is an ab‐
232 stract syntax tree consisting
233 of a single node node for the
234 nonterminal, without any
235 children. Any ASTs generated
236 by the symbol's right hand
237 side are discarded.
238
239 void The nonterminal has no seman‐
240 tic value. Any ASTs generated
241 by the symbol's right hand
242 side are discarded (as well).
243
244 start The value is the serialization of the start pars‐
245 ing expression of the grammar, as specified in the
246 section PE serialization format.
247
248 [4] The terminal symbols of the grammar are specified implic‐
249 itly as the set of all terminal symbols used in the start
250 expression and on the RHS of the grammar rules.
251
252 canonical serialization
253 The canonical serialization of a grammar has the format as spec‐
254 ified in the previous item, and then additionally satisfies the
255 constraints below, which make it unique among all the possible
256 serializations of this grammar.
257
258 [1] The keys found in all the nested Tcl dictionaries are
259 sorted in ascending dictionary order, as generated by
260 Tcl's builtin command lsort -increasing -dict.
261
262 [2] The string representation of the value is the canonical
263 representation of a Tcl dictionary. I.e. it does not con‐
264 tain superfluous whitespace.
265
266 EXAMPLE
267 Assuming the following PEG for simple mathematical expressions
268
269 PEG calculator (Expression)
270 Digit <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9' ;
271 Sign <- '-' / '+' ;
272 Number <- Sign? Digit+ ;
273 Expression <- Term (AddOp Term)* ;
274 MulOp <- '*' / '/' ;
275 Term <- Factor (MulOp Factor)* ;
276 AddOp <- '+'/'-' ;
277 Factor <- '(' Expression ')' / Number ;
278 END;
279
280
281 then its canonical serialization (except for whitespace) is
282
283 pt::grammar::peg {
284 rules {
285 AddOp {is {/ {t -} {t +}} mode value}
286 Digit {is {/ {t 0} {t 1} {t 2} {t 3} {t 4} {t 5} {t 6} {t 7} {t 8} {t 9}} mode value}
287 Expression {is {x {n Term} {* {x {n AddOp} {n Term}}}} mode value}
288 Factor {is {/ {x {t (} {n Expression} {t )}} {n Number}} mode value}
289 MulOp {is {/ {t *} {t /}} mode value}
290 Number {is {x {? {n Sign}} {+ {n Digit}}} mode value}
291 Sign {is {/ {t -} {t +}} mode value}
292 Term {is {x {n Factor} {* {x {n MulOp} {n Factor}}}} mode value}
293 }
294 start {n Expression}
295 }
296
297
299 Here we specify the format used by the Parser Tools to serialize Pars‐
300 ing Expressions as immutable values for transport, comparison, etc.
301
302 We distinguish between regular and canonical serializations. While a
303 parsing expression may have more than one regular serialization only
304 exactly one of them will be canonical.
305
306 Regular serialization
307
308 Atomic Parsing Expressions
309
310 [1] The string epsilon is an atomic parsing expres‐
311 sion. It matches the empty string.
312
313 [2] The string dot is an atomic parsing expression. It
314 matches any character.
315
316 [3] The string alnum is an atomic parsing expression.
317 It matches any Unicode alphabet or digit charac‐
318 ter. This is a custom extension of PEs based on
319 Tcl's builtin command string is.
320
321 [4] The string alpha is an atomic parsing expression.
322 It matches any Unicode alphabet character. This is
323 a custom extension of PEs based on Tcl's builtin
324 command string is.
325
326 [5] The string ascii is an atomic parsing expression.
327 It matches any Unicode character below U0080. This
328 is a custom extension of PEs based on Tcl's
329 builtin command string is.
330
331 [6] The string control is an atomic parsing expres‐
332 sion. It matches any Unicode control character.
333 This is a custom extension of PEs based on Tcl's
334 builtin command string is.
335
336 [7] The string digit is an atomic parsing expression.
337 It matches any Unicode digit character. Note that
338 this includes characters outside of the [0..9]
339 range. This is a custom extension of PEs based on
340 Tcl's builtin command string is.
341
342 [8] The string graph is an atomic parsing expression.
343 It matches any Unicode printing character, except
344 for space. This is a custom extension of PEs based
345 on Tcl's builtin command string is.
346
347 [9] The string lower is an atomic parsing expression.
348 It matches any Unicode lower-case alphabet charac‐
349 ter. This is a custom extension of PEs based on
350 Tcl's builtin command string is.
351
352 [10] The string print is an atomic parsing expression.
353 It matches any Unicode printing character, includ‐
354 ing space. This is a custom extension of PEs based
355 on Tcl's builtin command string is.
356
357 [11] The string punct is an atomic parsing expression.
358 It matches any Unicode punctuation character. This
359 is a custom extension of PEs based on Tcl's
360 builtin command string is.
361
362 [12] The string space is an atomic parsing expression.
363 It matches any Unicode space character. This is a
364 custom extension of PEs based on Tcl's builtin
365 command string is.
366
367 [13] The string upper is an atomic parsing expression.
368 It matches any Unicode upper-case alphabet charac‐
369 ter. This is a custom extension of PEs based on
370 Tcl's builtin command string is.
371
372 [14] The string wordchar is an atomic parsing expres‐
373 sion. It matches any Unicode word character. This
374 is any alphanumeric character (see alnum), and any
375 connector punctuation characters (e.g. under‐
376 score). This is a custom extension of PEs based on
377 Tcl's builtin command string is.
378
379 [15] The string xdigit is an atomic parsing expression.
380 It matches any hexadecimal digit character. This
381 is a custom extension of PEs based on Tcl's
382 builtin command string is.
383
384 [16] The string ddigit is an atomic parsing expression.
385 It matches any decimal digit character. This is a
386 custom extension of PEs based on Tcl's builtin
387 command regexp.
388
389 [17] The expression [list t x] is an atomic parsing ex‐
390 pression. It matches the terminal string x.
391
392 [18] The expression [list n A] is an atomic parsing ex‐
393 pression. It matches the nonterminal A.
394
395 Combined Parsing Expressions
396
397 [1] For parsing expressions e1, e2, ... the result of
398 [list / e1 e2 ... ] is a parsing expression as
399 well. This is the ordered choice, aka prioritized
400 choice.
401
402 [2] For parsing expressions e1, e2, ... the result of
403 [list x e1 e2 ... ] is a parsing expression as
404 well. This is the sequence.
405
406 [3] For a parsing expression e the result of [list *
407 e] is a parsing expression as well. This is the
408 kleene closure, describing zero or more repeti‐
409 tions.
410
411 [4] For a parsing expression e the result of [list +
412 e] is a parsing expression as well. This is the
413 positive kleene closure, describing one or more
414 repetitions.
415
416 [5] For a parsing expression e the result of [list &
417 e] is a parsing expression as well. This is the
418 and lookahead predicate.
419
420 [6] For a parsing expression e the result of [list !
421 e] is a parsing expression as well. This is the
422 not lookahead predicate.
423
424 [7] For a parsing expression e the result of [list ?
425 e] is a parsing expression as well. This is the
426 optional input.
427
428 Canonical serialization
429 The canonical serialization of a parsing expression has the for‐
430 mat as specified in the previous item, and then additionally
431 satisfies the constraints below, which make it unique among all
432 the possible serializations of this parsing expression.
433
434 [1] The string representation of the value is the canonical
435 representation of a pure Tcl list. I.e. it does not con‐
436 tain superfluous whitespace.
437
438 [2] Terminals are not encoded as ranges (where start and end
439 of the range are identical).
440
441 EXAMPLE
442 Assuming the parsing expression shown on the right-hand side of the
443 rule
444
445 Expression <- Term (AddOp Term)*
446
447
448 then its canonical serialization (except for whitespace) is
449
450 {x {n Term} {* {x {n AddOp} {n Term}}}}
451
452
454 This document, and the package it describes, will undoubtedly contain
455 bugs and other problems. Please report such in the category pt of the
456 Tcllib Trackers [http://core.tcl.tk/tcllib/reportlist]. Please also
457 report any ideas for enhancements you may have for either package
458 and/or documentation.
459
460 When proposing code changes, please provide unified diffs, i.e the out‐
461 put of diff -u.
462
463 Note further that attachments are strongly preferred over inlined
464 patches. Attachments can be made by going to the Edit form of the
465 ticket immediately after its creation, and then using the left-most
466 button in the secondary navigation bar.
467
469 EBNF, LL(k), PEG, TDPL, context-free languages, expression, grammar,
470 matching, parser, parsing expression, parsing expression grammar, push
471 down automaton, recursive descent, state, top-down parsing languages,
472 transducer
473
475 Parsing and Grammars
476
478 Copyright (c) 2009 Andreas Kupries <andreas_kupries@users.sourceforge.net>
479
480
481
482
483tcllib 1 pt_import_api(i)