1pt::peg::import(n) Parser Tools pt::peg::import(n)
2
3
4
5______________________________________________________________________________
6
8 pt::peg::import - PEG Import
9
11 package require Tcl 8.5
12
13 package require snit
14
15 package require configuration
16
17 package require pt::peg
18
19 package require pluginmgr
20
21 package require pt::peg::import ?1?
22
23 ::pt::peg::import objectName
24
25 objectName method ?arg arg ...?
26
27 objectName destroy
28
29 objectName import text text ?format?
30
31 objectName import file path ?format?
32
33 objectName import object text object text ?format?
34
35 objectName import object file object path ?format?
36
37 objectName includes
38
39 objectName include add path
40
41 objectName include remove path
42
43 objectName include clear
44
45______________________________________________________________________________
46
48 Are you lost ? Do you have trouble understanding this document ? In
49 that case please read the overview provided by the Introduction to
50 Parser Tools. This document is the entrypoint to the whole system the
51 current package is a part of.
52
53 This package provides a manager for parsing expression grammars, with
54 each instance handling a set of plugins for the import of them from
55 other formats, i.e. their conversion from, for example peg, container,
56 json, etc.
57
58 It resides in the Import section of the Core Layer of Parser Tools, and
59 is one of the three pillars the management of parsing expression gram‐
60 mars resides on.
61
62 IMAGE: arch_core_import
63
64 The other two pillars are, as shown above
65
66 [1] PEG Export, and
67
68 [2] PEG Storage
69
70 For information about the data structure which is the major output of
71 the manager objects provided by this package see the section PEG seri‐
72 alization format.
73
74 The plugin system of our class is based on the package pluginmgr, and
75 configured to look for plugins using
76
77 [1] the environment variable GRAMMAR_PEG_IMPORT_PLUGINS,
78
79 [2] the environment variable GRAMMAR_PEG_PLUGINS,
80
81 [3] the environment variable GRAMMAR_PLUGINS,
82
83 [4] the path "~/.grammar/peg/import/plugin"
84
85 [5] the path "~/.grammar/peg/plugin"
86
87 [6] the path "~/.grammar/plugin"
88
89 [7] the path "~/.grammar/peg/import/plugins"
90
91 [8] the path "~/.grammar/peg/plugins"
92
93 [9] the path "~/.grammar/plugins"
94
95 [10] the registry entry "HKEY_CURRENT_USER\SOFTWARE\GRAM‐
96 MAR\PEG\IMPORT\PLUGINS"
97
98 [11] the registry entry "HKEY_CURRENT_USER\SOFTWARE\GRAMMAR\PEG\PLUG‐
99 INS"
100
101 [12] the registry entry "HKEY_CURRENT_USER\SOFTWARE\GRAMMAR\PLUGINS"
102
103 The last three are used only when the package is run on a machine using
104 the Windows(tm) operating system.
105
106 The whole system is delivered with three predefined import plugins,
107 namely
108
109 container
110 See PEG Import Plugin. From CONTAINER format for details.
111
112 json See PEG Import Plugin. From JSON format for details.
113
114 peg See PEG Import Plugin. From PEG format for details.
115
116 For readers wishing to write their own import plugin for some format,
117 i.e. plugin writers, reading and understanding the Parser Tools Impport
118 API specification is an absolute necessity, as it documents the inter‐
119 action between this package and its plugins in detail.
120
122 PACKAGE COMMANDS
123 ::pt::peg::import objectName
124 This command creates a new import manager object with an associ‐
125 ated Tcl command whose name is objectName. This object command
126 is explained in full detail in the sections Object command and
127 Object methods. The object command will be created under the
128 current namespace if the objectName is not fully qualified, and
129 in the specified namespace otherwise.
130
131 OBJECT COMMAND
132 All objects created by the ::pt::peg::import command have the following
133 general form:
134
135 objectName method ?arg arg ...?
136 The method method and its arg'uments determine the exact behav‐
137 ior of the command. See section Object methods for the detailed
138 specifications.
139
140 OBJECT METHODS
141 objectName destroy
142 This method destroys the object it is invoked for.
143
144 objectName import text text ?format?
145 This method takes the text and converts it from the specified
146 format to the canonical serialization of a parsing expression
147 grammar using the import plugin for the format. An error is
148 thrown if no plugin could be found for the format. The serial‐
149 ization generated by the conversion process is returned as the
150 result of this method.
151
152 If no format is specified the method defaults to text.
153
154 The specification of what a canonical serialization is can be
155 found in the section PEG serialization format.
156
157 The plugin has to conform to the interface documented in the
158 Parser Tools Import API specification.
159
160 objectName import file path ?format?
161 This method is a convenient wrapper around the import text
162 method described by the previous item. It reads the contents of
163 the specified file into memory, feeds the result into import
164 text and returns the resulting serialization as its own result.
165
166 objectName import object text object text ?format?
167 This method is a convenient wrapper around the import text
168 method described by the previous item. It expects that object
169 is an object command supporting a deserialize method expecting
170 the canonical serialization of a parsing expression grammar. It
171 imports the text using import text and then feeds the resulting
172 serialization into the object via deserialize. This method
173 returns the empty string as it result.
174
175 objectName import object file object path ?format?
176 This method behaves like import object text, except that it
177 reads the text to convert from the specified file instead of
178 being given it as argument.
179
180 objectName includes
181 This method returns a list containing the currently specified
182 paths to use to search for include files when processing input.
183 The order of paths in the list corresponds to the order in which
184 they are used, from first to last, and also corresponds to the
185 order in which they were added to the object.
186
187 objectName include add path
188 This methods adds the specified path to the list of paths to use
189 to search for include files when processing input. The path is
190 added to the end of the list, causing it to be searched after
191 all previously added paths. The result of the command is the
192 empty string.
193
194 The method does nothing if the path is already known.
195
196 objectName include remove path
197 This methods removes the specified path from the list of paths
198 to use to search for include files when processing input. The
199 result of the command is the empty string.
200
201 The method does nothing if the path is not known.
202
203 objectName include clear
204 This method clears the list of paths to use to search for
205 include files when processing input. The result of the command
206 is the empty string.
207
209 Here we specify the format used by the Parser Tools to serialize Pars‐
210 ing Expression Grammars as immutable values for transport, comparison,
211 etc.
212
213 We distinguish between regular and canonical serializations. While a
214 PEG may have more than one regular serialization only exactly one of
215 them will be canonical.
216
217 regular serialization
218
219 [1] The serialization of any PEG is a nested Tcl dictionary.
220
221 [2] This dictionary holds a single key, pt::grammar::peg, and
222 its value. This value holds the contents of the grammar.
223
224 [3] The contents of the grammar are a Tcl dictionary holding
225 the set of nonterminal symbols and the starting expres‐
226 sion. The relevant keys and their values are
227
228 rules The value is a Tcl dictionary whose keys are the
229 names of the nonterminal symbols known to the
230 grammar.
231
232 [1] Each nonterminal symbol may occur only
233 once.
234
235 [2] The empty string is not a legal nonterminal
236 symbol.
237
238 [3] The value for each symbol is a Tcl dictio‐
239 nary itself. The relevant keys and their
240 values in this dictionary are
241
242 is The value is the serialization of
243 the parsing expression describing
244 the symbols sentennial structure, as
245 specified in the section PE serial‐
246 ization format.
247
248 mode The value can be one of three values
249 specifying how a parser should han‐
250 dle the semantic value produced by
251 the symbol.
252
253 value The semantic value of the
254 nonterminal symbol is an
255 abstract syntax tree consist‐
256 ing of a single node node for
257 the nonterminal itself, which
258 has the ASTs of the symbol's
259 right hand side as its chil‐
260 dren.
261
262 leaf The semantic value of the
263 nonterminal symbol is an
264 abstract syntax tree consist‐
265 ing of a single node node for
266 the nonterminal, without any
267 children. Any ASTs generated
268 by the symbol's right hand
269 side are discarded.
270
271 void The nonterminal has no seman‐
272 tic value. Any ASTs generated
273 by the symbol's right hand
274 side are discarded (as well).
275
276 start The value is the serialization of the start pars‐
277 ing expression of the grammar, as specified in the
278 section PE serialization format.
279
280 [4] The terminal symbols of the grammar are specified implic‐
281 itly as the set of all terminal symbols used in the start
282 expression and on the RHS of the grammar rules.
283
284 canonical serialization
285 The canonical serialization of a grammar has the format as spec‐
286 ified in the previous item, and then additionally satisfies the
287 constraints below, which make it unique among all the possible
288 serializations of this grammar.
289
290 [1] The keys found in all the nested Tcl dictionaries are
291 sorted in ascending dictionary order, as generated by
292 Tcl's builtin command lsort -increasing -dict.
293
294 [2] The string representation of the value is the canonical
295 representation of a Tcl dictionary. I.e. it does not con‐
296 tain superfluous whitespace.
297
298 EXAMPLE
299 Assuming the following PEG for simple mathematical expressions
300
301 PEG calculator (Expression)
302 Digit <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9' ;
303 Sign <- '-' / '+' ;
304 Number <- Sign? Digit+ ;
305 Expression <- Term (AddOp Term)* ;
306 MulOp <- '*' / '/' ;
307 Term <- Factor (MulOp Factor)* ;
308 AddOp <- '+'/'-' ;
309 Factor <- '(' Expression ')' / Number ;
310 END;
311
312
313 then its canonical serialization (except for whitespace) is
314
315 pt::grammar::peg {
316 rules {
317 AddOp {is {/ {t -} {t +}} mode value}
318 Digit {is {/ {t 0} {t 1} {t 2} {t 3} {t 4} {t 5} {t 6} {t 7} {t 8} {t 9}} mode value}
319 Expression {is {x {n Term} {* {x {n AddOp} {n Term}}}} mode value}
320 Factor {is {/ {x {t (} {n Expression} {t )}} {n Number}} mode value}
321 MulOp {is {/ {t *} {t /}} mode value}
322 Number {is {x {? {n Sign}} {+ {n Digit}}} mode value}
323 Sign {is {/ {t -} {t +}} mode value}
324 Term {is {x {n Factor} {* {x {n MulOp} {n Factor}}}} mode value}
325 }
326 start {n Expression}
327 }
328
329
331 Here we specify the format used by the Parser Tools to serialize Pars‐
332 ing Expressions as immutable values for transport, comparison, etc.
333
334 We distinguish between regular and canonical serializations. While a
335 parsing expression may have more than one regular serialization only
336 exactly one of them will be canonical.
337
338 Regular serialization
339
340 Atomic Parsing Expressions
341
342 [1] The string epsilon is an atomic parsing expres‐
343 sion. It matches the empty string.
344
345 [2] The string dot is an atomic parsing expression. It
346 matches any character.
347
348 [3] The string alnum is an atomic parsing expression.
349 It matches any Unicode alphabet or digit charac‐
350 ter. This is a custom extension of PEs based on
351 Tcl's builtin command string is.
352
353 [4] The string alpha is an atomic parsing expression.
354 It matches any Unicode alphabet character. This is
355 a custom extension of PEs based on Tcl's builtin
356 command string is.
357
358 [5] The string ascii is an atomic parsing expression.
359 It matches any Unicode character below U0080. This
360 is a custom extension of PEs based on Tcl's
361 builtin command string is.
362
363 [6] The string control is an atomic parsing expres‐
364 sion. It matches any Unicode control character.
365 This is a custom extension of PEs based on Tcl's
366 builtin command string is.
367
368 [7] The string digit is an atomic parsing expression.
369 It matches any Unicode digit character. Note that
370 this includes characters outside of the [0..9]
371 range. This is a custom extension of PEs based on
372 Tcl's builtin command string is.
373
374 [8] The string graph is an atomic parsing expression.
375 It matches any Unicode printing character, except
376 for space. This is a custom extension of PEs based
377 on Tcl's builtin command string is.
378
379 [9] The string lower is an atomic parsing expression.
380 It matches any Unicode lower-case alphabet charac‐
381 ter. This is a custom extension of PEs based on
382 Tcl's builtin command string is.
383
384 [10] The string print is an atomic parsing expression.
385 It matches any Unicode printing character, includ‐
386 ing space. This is a custom extension of PEs based
387 on Tcl's builtin command string is.
388
389 [11] The string punct is an atomic parsing expression.
390 It matches any Unicode punctuation character. This
391 is a custom extension of PEs based on Tcl's
392 builtin command string is.
393
394 [12] The string space is an atomic parsing expression.
395 It matches any Unicode space character. This is a
396 custom extension of PEs based on Tcl's builtin
397 command string is.
398
399 [13] The string upper is an atomic parsing expression.
400 It matches any Unicode upper-case alphabet charac‐
401 ter. This is a custom extension of PEs based on
402 Tcl's builtin command string is.
403
404 [14] The string wordchar is an atomic parsing expres‐
405 sion. It matches any Unicode word character. This
406 is any alphanumeric character (see alnum), and any
407 connector punctuation characters (e.g. under‐
408 score). This is a custom extension of PEs based on
409 Tcl's builtin command string is.
410
411 [15] The string xdigit is an atomic parsing expression.
412 It matches any hexadecimal digit character. This
413 is a custom extension of PEs based on Tcl's
414 builtin command string is.
415
416 [16] The string ddigit is an atomic parsing expression.
417 It matches any decimal digit character. This is a
418 custom extension of PEs based on Tcl's builtin
419 command regexp.
420
421 [17] The expression [list t x] is an atomic parsing
422 expression. It matches the terminal string x.
423
424 [18] The expression [list n A] is an atomic parsing
425 expression. It matches the nonterminal A.
426
427 Combined Parsing Expressions
428
429 [1] For parsing expressions e1, e2, ... the result of
430 [list / e1 e2 ... ] is a parsing expression as
431 well. This is the ordered choice, aka prioritized
432 choice.
433
434 [2] For parsing expressions e1, e2, ... the result of
435 [list x e1 e2 ... ] is a parsing expression as
436 well. This is the sequence.
437
438 [3] For a parsing expression e the result of [list *
439 e] is a parsing expression as well. This is the
440 kleene closure, describing zero or more repeti‐
441 tions.
442
443 [4] For a parsing expression e the result of [list +
444 e] is a parsing expression as well. This is the
445 positive kleene closure, describing one or more
446 repetitions.
447
448 [5] For a parsing expression e the result of [list &
449 e] is a parsing expression as well. This is the
450 and lookahead predicate.
451
452 [6] For a parsing expression e the result of [list !
453 e] is a parsing expression as well. This is the
454 not lookahead predicate.
455
456 [7] For a parsing expression e the result of [list ?
457 e] is a parsing expression as well. This is the
458 optional input.
459
460 Canonical serialization
461 The canonical serialization of a parsing expression has the for‐
462 mat as specified in the previous item, and then additionally
463 satisfies the constraints below, which make it unique among all
464 the possible serializations of this parsing expression.
465
466 [1] The string representation of the value is the canonical
467 representation of a pure Tcl list. I.e. it does not con‐
468 tain superfluous whitespace.
469
470 [2] Terminals are not encoded as ranges (where start and end
471 of the range are identical).
472
473 EXAMPLE
474 Assuming the parsing expression shown on the right-hand side of the
475 rule
476
477 Expression <- Term (AddOp Term)*
478
479
480 then its canonical serialization (except for whitespace) is
481
482 {x {n Term} {* {x {n AddOp} {n Term}}}}
483
484
486 This document, and the package it describes, will undoubtedly contain
487 bugs and other problems. Please report such in the category pt of the
488 Tcllib Trackers [http://core.tcl.tk/tcllib/reportlist]. Please also
489 report any ideas for enhancements you may have for either package
490 and/or documentation.
491
492 When proposing code changes, please provide unified diffs, i.e the out‐
493 put of diff -u.
494
495 Note further that attachments are strongly preferred over inlined
496 patches. Attachments can be made by going to the Edit form of the
497 ticket immediately after its creation, and then using the left-most
498 button in the secondary navigation bar.
499
501 EBNF, LL(k), PEG, TDPL, context-free languages, expression, grammar,
502 matching, parser, parsing expression, parsing expression grammar, push
503 down automaton, recursive descent, state, top-down parsing languages,
504 transducer
505
507 Parsing and Grammars
508
510 Copyright (c) 2009 Andreas Kupries <andreas_kupries@users.sourceforge.net>
511
512
513
514
515tcllib 1 pt::peg::import(n)