1pt::peg::import(n) Parser Tools pt::peg::import(n)
2
3
4
5______________________________________________________________________________
6
8 pt::peg::import - PEG Import
9
11 package require Tcl 8.5
12
13 package require Tcl 8.5
14
15 package require snit
16
17 package require fileutil::paths
18
19 package require pt::peg
20
21 package require pluginmgr
22
23 package require pt::peg::import ?1.0.1?
24
25 ::pt::peg::import objectName
26
27 objectName method ?arg arg ...?
28
29 objectName destroy
30
31 objectName import text text ?format?
32
33 objectName import file path ?format?
34
35 objectName import object text object text ?format?
36
37 objectName import object file object path ?format?
38
39 objectName includes
40
41 objectName include add path
42
43 objectName include remove path
44
45 objectName include clear
46
47______________________________________________________________________________
48
50 Are you lost ? Do you have trouble understanding this document ? In
51 that case please read the overview provided by the Introduction to
52 Parser Tools. This document is the entrypoint to the whole system the
53 current package is a part of.
54
55 This package provides a manager for parsing expression grammars, with
56 each instance handling a set of plugins for the import of them from
57 other formats, i.e. their conversion from, for example peg, container,
58 json, etc.
59
60 It resides in the Import section of the Core Layer of Parser Tools, and
61 is one of the three pillars the management of parsing expression gram‐
62 mars resides on.
63
64 IMAGE: arch_core_import
65
66 The other two pillars are, as shown above
67
68 [1] PEG Export, and
69
70 [2] PEG Storage
71
72 For information about the data structure which is the major output of
73 the manager objects provided by this package see the section PEG seri‐
74 alization format.
75
76 The plugin system of our class is based on the package pluginmgr, and
77 configured to look for plugins using
78
79 [1] the environment variable GRAMMAR_PEG_IMPORT_PLUGINS,
80
81 [2] the environment variable GRAMMAR_PEG_PLUGINS,
82
83 [3] the environment variable GRAMMAR_PLUGINS,
84
85 [4] the path "~/.grammar/peg/import/plugin"
86
87 [5] the path "~/.grammar/peg/plugin"
88
89 [6] the path "~/.grammar/plugin"
90
91 [7] the path "~/.grammar/peg/import/plugins"
92
93 [8] the path "~/.grammar/peg/plugins"
94
95 [9] the path "~/.grammar/plugins"
96
97 [10] the registry entry "HKEY_CURRENT_USER\SOFTWARE\GRAMMAR\PEG\IM‐
98 PORT\PLUGINS"
99
100 [11] the registry entry "HKEY_CURRENT_USER\SOFTWARE\GRAMMAR\PEG\PLUG‐
101 INS"
102
103 [12] the registry entry "HKEY_CURRENT_USER\SOFTWARE\GRAMMAR\PLUGINS"
104
105 The last three are used only when the package is run on a machine using
106 the Windows(tm) operating system.
107
108 The whole system is delivered with three predefined import plugins,
109 namely
110
111 container
112 See PEG Import Plugin. From CONTAINER format for details.
113
114 json See PEG Import Plugin. From JSON format for details.
115
116 peg See PEG Import Plugin. From PEG format for details.
117
118 For readers wishing to write their own import plugin for some format,
119 i.e. plugin writers, reading and understanding the Parser Tools Impport
120 API specification is an absolute necessity, as it documents the inter‐
121 action between this package and its plugins in detail.
122
124 PACKAGE COMMANDS
125 ::pt::peg::import objectName
126 This command creates a new import manager object with an associ‐
127 ated Tcl command whose name is objectName. This object command
128 is explained in full detail in the sections Object command and
129 Object methods. The object command will be created under the
130 current namespace if the objectName is not fully qualified, and
131 in the specified namespace otherwise.
132
133 OBJECT COMMAND
134 All objects created by the ::pt::peg::import command have the following
135 general form:
136
137 objectName method ?arg arg ...?
138 The method method and its arg'uments determine the exact behav‐
139 ior of the command. See section Object methods for the detailed
140 specifications.
141
142 OBJECT METHODS
143 objectName destroy
144 This method destroys the object it is invoked for.
145
146 objectName import text text ?format?
147 This method takes the text and converts it from the specified
148 format to the canonical serialization of a parsing expression
149 grammar using the import plugin for the format. An error is
150 thrown if no plugin could be found for the format. The serial‐
151 ization generated by the conversion process is returned as the
152 result of this method.
153
154 If no format is specified the method defaults to text.
155
156 The specification of what a canonical serialization is can be
157 found in the section PEG serialization format.
158
159 The plugin has to conform to the interface documented in the
160 Parser Tools Import API specification.
161
162 objectName import file path ?format?
163 This method is a convenient wrapper around the import text
164 method described by the previous item. It reads the contents of
165 the specified file into memory, feeds the result into import
166 text and returns the resulting serialization as its own result.
167
168 objectName import object text object text ?format?
169 This method is a convenient wrapper around the import text
170 method described by the previous item. It expects that object
171 is an object command supporting a deserialize method expecting
172 the canonical serialization of a parsing expression grammar. It
173 imports the text using import text and then feeds the resulting
174 serialization into the object via deserialize. This method re‐
175 turns the empty string as it result.
176
177 objectName import object file object path ?format?
178 This method behaves like import object text, except that it
179 reads the text to convert from the specified file instead of be‐
180 ing given it as argument.
181
182 objectName includes
183 This method returns a list containing the currently specified
184 paths to use to search for include files when processing input.
185 The order of paths in the list corresponds to the order in which
186 they are used, from first to last, and also corresponds to the
187 order in which they were added to the object.
188
189 objectName include add path
190 This methods adds the specified path to the list of paths to use
191 to search for include files when processing input. The path is
192 added to the end of the list, causing it to be searched after
193 all previously added paths. The result of the command is the
194 empty string.
195
196 The method does nothing if the path is already known.
197
198 objectName include remove path
199 This methods removes the specified path from the list of paths
200 to use to search for include files when processing input. The
201 result of the command is the empty string.
202
203 The method does nothing if the path is not known.
204
205 objectName include clear
206 This method clears the list of paths to use to search for in‐
207 clude files when processing input. The result of the command is
208 the empty string.
209
211 Here we specify the format used by the Parser Tools to serialize Pars‐
212 ing Expression Grammars as immutable values for transport, comparison,
213 etc.
214
215 We distinguish between regular and canonical serializations. While a
216 PEG may have more than one regular serialization only exactly one of
217 them will be canonical.
218
219 regular serialization
220
221 [1] The serialization of any PEG is a nested Tcl dictionary.
222
223 [2] This dictionary holds a single key, pt::grammar::peg, and
224 its value. This value holds the contents of the grammar.
225
226 [3] The contents of the grammar are a Tcl dictionary holding
227 the set of nonterminal symbols and the starting expres‐
228 sion. The relevant keys and their values are
229
230 rules The value is a Tcl dictionary whose keys are the
231 names of the nonterminal symbols known to the
232 grammar.
233
234 [1] Each nonterminal symbol may occur only
235 once.
236
237 [2] The empty string is not a legal nonterminal
238 symbol.
239
240 [3] The value for each symbol is a Tcl dictio‐
241 nary itself. The relevant keys and their
242 values in this dictionary are
243
244 is The value is the serialization of
245 the parsing expression describing
246 the symbols sentennial structure, as
247 specified in the section PE serial‐
248 ization format.
249
250 mode The value can be one of three values
251 specifying how a parser should han‐
252 dle the semantic value produced by
253 the symbol.
254
255 value The semantic value of the
256 nonterminal symbol is an ab‐
257 stract syntax tree consisting
258 of a single node node for the
259 nonterminal itself, which has
260 the ASTs of the symbol's
261 right hand side as its chil‐
262 dren.
263
264 leaf The semantic value of the
265 nonterminal symbol is an ab‐
266 stract syntax tree consisting
267 of a single node node for the
268 nonterminal, without any
269 children. Any ASTs generated
270 by the symbol's right hand
271 side are discarded.
272
273 void The nonterminal has no seman‐
274 tic value. Any ASTs generated
275 by the symbol's right hand
276 side are discarded (as well).
277
278 start The value is the serialization of the start pars‐
279 ing expression of the grammar, as specified in the
280 section PE serialization format.
281
282 [4] The terminal symbols of the grammar are specified implic‐
283 itly as the set of all terminal symbols used in the start
284 expression and on the RHS of the grammar rules.
285
286 canonical serialization
287 The canonical serialization of a grammar has the format as spec‐
288 ified in the previous item, and then additionally satisfies the
289 constraints below, which make it unique among all the possible
290 serializations of this grammar.
291
292 [1] The keys found in all the nested Tcl dictionaries are
293 sorted in ascending dictionary order, as generated by
294 Tcl's builtin command lsort -increasing -dict.
295
296 [2] The string representation of the value is the canonical
297 representation of a Tcl dictionary. I.e. it does not con‐
298 tain superfluous whitespace.
299
300 EXAMPLE
301 Assuming the following PEG for simple mathematical expressions
302
303 PEG calculator (Expression)
304 Digit <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9' ;
305 Sign <- '-' / '+' ;
306 Number <- Sign? Digit+ ;
307 Expression <- Term (AddOp Term)* ;
308 MulOp <- '*' / '/' ;
309 Term <- Factor (MulOp Factor)* ;
310 AddOp <- '+'/'-' ;
311 Factor <- '(' Expression ')' / Number ;
312 END;
313
314
315 then its canonical serialization (except for whitespace) is
316
317 pt::grammar::peg {
318 rules {
319 AddOp {is {/ {t -} {t +}} mode value}
320 Digit {is {/ {t 0} {t 1} {t 2} {t 3} {t 4} {t 5} {t 6} {t 7} {t 8} {t 9}} mode value}
321 Expression {is {x {n Term} {* {x {n AddOp} {n Term}}}} mode value}
322 Factor {is {/ {x {t (} {n Expression} {t )}} {n Number}} mode value}
323 MulOp {is {/ {t *} {t /}} mode value}
324 Number {is {x {? {n Sign}} {+ {n Digit}}} mode value}
325 Sign {is {/ {t -} {t +}} mode value}
326 Term {is {x {n Factor} {* {x {n MulOp} {n Factor}}}} mode value}
327 }
328 start {n Expression}
329 }
330
331
333 Here we specify the format used by the Parser Tools to serialize Pars‐
334 ing Expressions as immutable values for transport, comparison, etc.
335
336 We distinguish between regular and canonical serializations. While a
337 parsing expression may have more than one regular serialization only
338 exactly one of them will be canonical.
339
340 Regular serialization
341
342 Atomic Parsing Expressions
343
344 [1] The string epsilon is an atomic parsing expres‐
345 sion. It matches the empty string.
346
347 [2] The string dot is an atomic parsing expression. It
348 matches any character.
349
350 [3] The string alnum is an atomic parsing expression.
351 It matches any Unicode alphabet or digit charac‐
352 ter. This is a custom extension of PEs based on
353 Tcl's builtin command string is.
354
355 [4] The string alpha is an atomic parsing expression.
356 It matches any Unicode alphabet character. This is
357 a custom extension of PEs based on Tcl's builtin
358 command string is.
359
360 [5] The string ascii is an atomic parsing expression.
361 It matches any Unicode character below U0080. This
362 is a custom extension of PEs based on Tcl's
363 builtin command string is.
364
365 [6] The string control is an atomic parsing expres‐
366 sion. It matches any Unicode control character.
367 This is a custom extension of PEs based on Tcl's
368 builtin command string is.
369
370 [7] The string digit is an atomic parsing expression.
371 It matches any Unicode digit character. Note that
372 this includes characters outside of the [0..9]
373 range. This is a custom extension of PEs based on
374 Tcl's builtin command string is.
375
376 [8] The string graph is an atomic parsing expression.
377 It matches any Unicode printing character, except
378 for space. This is a custom extension of PEs based
379 on Tcl's builtin command string is.
380
381 [9] The string lower is an atomic parsing expression.
382 It matches any Unicode lower-case alphabet charac‐
383 ter. This is a custom extension of PEs based on
384 Tcl's builtin command string is.
385
386 [10] The string print is an atomic parsing expression.
387 It matches any Unicode printing character, includ‐
388 ing space. This is a custom extension of PEs based
389 on Tcl's builtin command string is.
390
391 [11] The string punct is an atomic parsing expression.
392 It matches any Unicode punctuation character. This
393 is a custom extension of PEs based on Tcl's
394 builtin command string is.
395
396 [12] The string space is an atomic parsing expression.
397 It matches any Unicode space character. This is a
398 custom extension of PEs based on Tcl's builtin
399 command string is.
400
401 [13] The string upper is an atomic parsing expression.
402 It matches any Unicode upper-case alphabet charac‐
403 ter. This is a custom extension of PEs based on
404 Tcl's builtin command string is.
405
406 [14] The string wordchar is an atomic parsing expres‐
407 sion. It matches any Unicode word character. This
408 is any alphanumeric character (see alnum), and any
409 connector punctuation characters (e.g. under‐
410 score). This is a custom extension of PEs based on
411 Tcl's builtin command string is.
412
413 [15] The string xdigit is an atomic parsing expression.
414 It matches any hexadecimal digit character. This
415 is a custom extension of PEs based on Tcl's
416 builtin command string is.
417
418 [16] The string ddigit is an atomic parsing expression.
419 It matches any decimal digit character. This is a
420 custom extension of PEs based on Tcl's builtin
421 command regexp.
422
423 [17] The expression [list t x] is an atomic parsing ex‐
424 pression. It matches the terminal string x.
425
426 [18] The expression [list n A] is an atomic parsing ex‐
427 pression. It matches the nonterminal A.
428
429 Combined Parsing Expressions
430
431 [1] For parsing expressions e1, e2, ... the result of
432 [list / e1 e2 ... ] is a parsing expression as
433 well. This is the ordered choice, aka prioritized
434 choice.
435
436 [2] For parsing expressions e1, e2, ... the result of
437 [list x e1 e2 ... ] is a parsing expression as
438 well. This is the sequence.
439
440 [3] For a parsing expression e the result of [list *
441 e] is a parsing expression as well. This is the
442 kleene closure, describing zero or more repeti‐
443 tions.
444
445 [4] For a parsing expression e the result of [list +
446 e] is a parsing expression as well. This is the
447 positive kleene closure, describing one or more
448 repetitions.
449
450 [5] For a parsing expression e the result of [list &
451 e] is a parsing expression as well. This is the
452 and lookahead predicate.
453
454 [6] For a parsing expression e the result of [list !
455 e] is a parsing expression as well. This is the
456 not lookahead predicate.
457
458 [7] For a parsing expression e the result of [list ?
459 e] is a parsing expression as well. This is the
460 optional input.
461
462 Canonical serialization
463 The canonical serialization of a parsing expression has the for‐
464 mat as specified in the previous item, and then additionally
465 satisfies the constraints below, which make it unique among all
466 the possible serializations of this parsing expression.
467
468 [1] The string representation of the value is the canonical
469 representation of a pure Tcl list. I.e. it does not con‐
470 tain superfluous whitespace.
471
472 [2] Terminals are not encoded as ranges (where start and end
473 of the range are identical).
474
475 EXAMPLE
476 Assuming the parsing expression shown on the right-hand side of the
477 rule
478
479 Expression <- Term (AddOp Term)*
480
481
482 then its canonical serialization (except for whitespace) is
483
484 {x {n Term} {* {x {n AddOp} {n Term}}}}
485
486
488 This document, and the package it describes, will undoubtedly contain
489 bugs and other problems. Please report such in the category pt of the
490 Tcllib Trackers [http://core.tcl.tk/tcllib/reportlist]. Please also
491 report any ideas for enhancements you may have for either package
492 and/or documentation.
493
494 When proposing code changes, please provide unified diffs, i.e the out‐
495 put of diff -u.
496
497 Note further that attachments are strongly preferred over inlined
498 patches. Attachments can be made by going to the Edit form of the
499 ticket immediately after its creation, and then using the left-most
500 button in the secondary navigation bar.
501
503 EBNF, LL(k), PEG, TDPL, context-free languages, expression, grammar,
504 matching, parser, parsing expression, parsing expression grammar, push
505 down automaton, recursive descent, state, top-down parsing languages,
506 transducer
507
509 Parsing and Grammars
510
512 Copyright (c) 2009 Andreas Kupries <andreas_kupries@users.sourceforge.net>
513
514
515
516
517tcllib 1.0.1 pt::peg::import(n)