1pt::peg::export::json(n) Parser Tools pt::peg::export::json(n)
2
3
4
5______________________________________________________________________________
6
8 pt::peg::export::json - PEG Export Plugin. Write JSON format
9
11 package require Tcl 8.5
12
13 package require pt::peg::export::json ?1?
14
15 package require pt::peg::to::json
16
17 export serial configuration
18
19______________________________________________________________________________
20
22 Are you lost ? Do you have trouble understanding this document ? In
23 that case please read the overview provided by the Introduction to
24 Parser Tools. This document is the entrypoint to the whole system the
25 current package is a part of.
26
27 This package implements the parsing expression grammar export plugin
28 for the generation of JSON markup.
29
30 It resides in the Export section of the Core Layer of Parser Tools and
31 is intended to be used by pt::peg::export, the export manager, sitting
32 between it and the corresponding core conversion functionality provided
33 by pt::peg::to::json.
34
35 IMAGE: arch_core_eplugins
36
37 While the direct use of this package with a regular interpreter is pos‐
38 sible, this is strongly disrecommended and requires a number of contor‐
39 tions to provide the expected environment. The proper way to use this
40 functionality depends on the situation:
41
42 [1] In an untrusted environment the proper access is through the
43 package pt::peg::export and the export manager objects it pro‐
44 vides.
45
46 [2] In a trusted environment however simply use the package
47 pt::peg::to::json and access the core conversion functionality
48 directly.
49
51 The API provided by this package satisfies the specification of the
52 Plugin API found in the Parser Tools Export API specification.
53
54 export serial configuration
55 This command takes the canonical serialization of a parsing
56 expression grammar, as specified in section PEG serialization
57 format, and contained in serial, the configuration, a dictio‐
58 nary, and generates JSON markup encoding the grammar. The cre‐
59 ated string is then returned as the result of the command.
60
62 The JSON export plugin recognizes the following configuration variables
63 and changes its behaviour as they specify.
64
65 boolean indented
66 If this flag is set the plugin will break the generated JSON
67 code across lines and indent it according to its inner struc‐
68 ture, with each key of a dictionary on a separate line.
69
70 If this flag is not set (the default), the whole JSON object
71 will be written on a single line, with minimum spacing between
72 all elements.
73
74 boolean aligned
75 If this flag is set the generator ensures that the values for
76 the keys in a dictionary are vertically aligned with each other,
77 for a nice table effect. To make this work this also implies
78 that indented is set.
79
80 If this flag is not set (the default), the output is formatted
81 as per the value of indented, without trying to align the values
82 for dictionary keys.
83
84 Note that this plugin ignores the standard configuration variables
85 user, format, file, and name, and their values.
86
88 The json format for parsing expression grammars was written as a data
89 exchange format not bound to Tcl. It was defined to allow the exchange
90 of grammars with PackRat/PEG based parser generators for other lan‐
91 guages.
92
93 It is formally specified by the rules below:
94
95 [1] The JSON of any PEG is a JSON object.
96
97 [2] This object holds a single key, pt::grammar::peg, and its value.
98 This value holds the contents of the grammar.
99
100 [3] The contents of the grammar are a JSON object holding the set of
101 nonterminal symbols and the starting expression. The relevant
102 keys and their values are
103
104 rules The value is a JSON object whose keys are the names of
105 the nonterminal symbols known to the grammar.
106
107 [1] Each nonterminal symbol may occur only once.
108
109 [2] The empty string is not a legal nonterminal sym‐
110 bol.
111
112 [3] The value for each symbol is a JSON object itself.
113 The relevant keys and their values in this dictio‐
114 nary are
115
116 is The value is a JSON string holding the Tcl
117 serialization of the parsing expression
118 describing the symbols sentennial struc‐
119 ture, as specified in the section PE seri‐
120 alization format.
121
122 mode The value is a JSON holding holding one of
123 three values specifying how a parser should
124 handle the semantic value produced by the
125 symbol.
126
127 value The semantic value of the nontermi‐
128 nal symbol is an abstract syntax
129 tree consisting of a single node
130 node for the nonterminal itself,
131 which has the ASTs of the symbol's
132 right hand side as its children.
133
134 leaf The semantic value of the nontermi‐
135 nal symbol is an abstract syntax
136 tree consisting of a single node
137 node for the nonterminal, without
138 any children. Any ASTs generated by
139 the symbol's right hand side are
140 discarded.
141
142 void The nonterminal has no semantic
143 value. Any ASTs generated by the
144 symbol's right hand side are dis‐
145 carded (as well).
146
147 start The value is a JSON string holding the Tcl serialization
148 of the start parsing expression of the grammar, as speci‐
149 fied in the section PE serialization format.
150
151 [4] The terminal symbols of the grammar are specified implicitly as
152 the set of all terminal symbols used in the start expression and
153 on the RHS of the grammar rules.
154
155 As an aside to the advanced reader, this is pretty much the same as the
156 Tcl serialization of PE grammars, as specified in section PEG serial‐
157 ization format, except that the Tcl dictionaries and lists of that for‐
158 mat are mapped to JSON objects and arrays. Only the parsing expressions
159 themselves are not translated further, but kept as JSON strings con‐
160 taining a nested Tcl list, and there is no concept of canonicity for
161 the JSON either.
162
163 EXAMPLE
164 Assuming the following PEG for simple mathematical expressions
165
166 PEG calculator (Expression)
167 Digit <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9' ;
168 Sign <- '-' / '+' ;
169 Number <- Sign? Digit+ ;
170 Expression <- Term (AddOp Term)* ;
171 MulOp <- '*' / '/' ;
172 Term <- Factor (MulOp Factor)* ;
173 AddOp <- '+'/'-' ;
174 Factor <- '(' Expression ')' / Number ;
175 END;
176
177
178 a JSON serialization for it is
179
180 {
181 "pt::grammar::peg" : {
182 "rules" : {
183 "AddOp" : {
184 "is" : "\/ {t -} {t +}",
185 "mode" : "value"
186 },
187 "Digit" : {
188 "is" : "\/ {t 0} {t 1} {t 2} {t 3} {t 4} {t 5} {t 6} {t 7} {t 8} {t 9}",
189 "mode" : "value"
190 },
191 "Expression" : {
192 "is" : "\/ {x {t (} {n Expression} {t )}} {x {n Factor} {* {x {n MulOp} {n Factor}}}}",
193 "mode" : "value"
194 },
195 "Factor" : {
196 "is" : "x {n Term} {* {x {n AddOp} {n Term}}}",
197 "mode" : "value"
198 },
199 "MulOp" : {
200 "is" : "\/ {t *} {t \/}",
201 "mode" : "value"
202 },
203 "Number" : {
204 "is" : "x {? {n Sign}} {+ {n Digit}}",
205 "mode" : "value"
206 },
207 "Sign" : {
208 "is" : "\/ {t -} {t +}",
209 "mode" : "value"
210 },
211 "Term" : {
212 "is" : "n Number",
213 "mode" : "value"
214 }
215 },
216 "start" : "n Expression"
217 }
218 }
219
220
221 and a Tcl serialization of the same is
222
223 pt::grammar::peg {
224 rules {
225 AddOp {is {/ {t -} {t +}} mode value}
226 Digit {is {/ {t 0} {t 1} {t 2} {t 3} {t 4} {t 5} {t 6} {t 7} {t 8} {t 9}} mode value}
227 Expression {is {x {n Term} {* {x {n AddOp} {n Term}}}} mode value}
228 Factor {is {/ {x {t (} {n Expression} {t )}} {n Number}} mode value}
229 MulOp {is {/ {t *} {t /}} mode value}
230 Number {is {x {? {n Sign}} {+ {n Digit}}} mode value}
231 Sign {is {/ {t -} {t +}} mode value}
232 Term {is {x {n Factor} {* {x {n MulOp} {n Factor}}}} mode value}
233 }
234 start {n Expression}
235 }
236
237
238 The similarity of the latter to the JSON should be quite obvious.
239
241 Here we specify the format used by the Parser Tools to serialize Pars‐
242 ing Expression Grammars as immutable values for transport, comparison,
243 etc.
244
245 We distinguish between regular and canonical serializations. While a
246 PEG may have more than one regular serialization only exactly one of
247 them will be canonical.
248
249 regular serialization
250
251 [1] The serialization of any PEG is a nested Tcl dictionary.
252
253 [2] This dictionary holds a single key, pt::grammar::peg, and
254 its value. This value holds the contents of the grammar.
255
256 [3] The contents of the grammar are a Tcl dictionary holding
257 the set of nonterminal symbols and the starting expres‐
258 sion. The relevant keys and their values are
259
260 rules The value is a Tcl dictionary whose keys are the
261 names of the nonterminal symbols known to the
262 grammar.
263
264 [1] Each nonterminal symbol may occur only
265 once.
266
267 [2] The empty string is not a legal nonterminal
268 symbol.
269
270 [3] The value for each symbol is a Tcl dictio‐
271 nary itself. The relevant keys and their
272 values in this dictionary are
273
274 is The value is the serialization of
275 the parsing expression describing
276 the symbols sentennial structure, as
277 specified in the section PE serial‐
278 ization format.
279
280 mode The value can be one of three values
281 specifying how a parser should han‐
282 dle the semantic value produced by
283 the symbol.
284
285 value The semantic value of the
286 nonterminal symbol is an
287 abstract syntax tree consist‐
288 ing of a single node node for
289 the nonterminal itself, which
290 has the ASTs of the symbol's
291 right hand side as its chil‐
292 dren.
293
294 leaf The semantic value of the
295 nonterminal symbol is an
296 abstract syntax tree consist‐
297 ing of a single node node for
298 the nonterminal, without any
299 children. Any ASTs generated
300 by the symbol's right hand
301 side are discarded.
302
303 void The nonterminal has no seman‐
304 tic value. Any ASTs generated
305 by the symbol's right hand
306 side are discarded (as well).
307
308 start The value is the serialization of the start pars‐
309 ing expression of the grammar, as specified in the
310 section PE serialization format.
311
312 [4] The terminal symbols of the grammar are specified implic‐
313 itly as the set of all terminal symbols used in the start
314 expression and on the RHS of the grammar rules.
315
316 canonical serialization
317 The canonical serialization of a grammar has the format as spec‐
318 ified in the previous item, and then additionally satisfies the
319 constraints below, which make it unique among all the possible
320 serializations of this grammar.
321
322 [1] The keys found in all the nested Tcl dictionaries are
323 sorted in ascending dictionary order, as generated by
324 Tcl's builtin command lsort -increasing -dict.
325
326 [2] The string representation of the value is the canonical
327 representation of a Tcl dictionary. I.e. it does not con‐
328 tain superfluous whitespace.
329
330 EXAMPLE
331 Assuming the following PEG for simple mathematical expressions
332
333 PEG calculator (Expression)
334 Digit <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9' ;
335 Sign <- '-' / '+' ;
336 Number <- Sign? Digit+ ;
337 Expression <- Term (AddOp Term)* ;
338 MulOp <- '*' / '/' ;
339 Term <- Factor (MulOp Factor)* ;
340 AddOp <- '+'/'-' ;
341 Factor <- '(' Expression ')' / Number ;
342 END;
343
344
345 then its canonical serialization (except for whitespace) is
346
347 pt::grammar::peg {
348 rules {
349 AddOp {is {/ {t -} {t +}} mode value}
350 Digit {is {/ {t 0} {t 1} {t 2} {t 3} {t 4} {t 5} {t 6} {t 7} {t 8} {t 9}} mode value}
351 Expression {is {x {n Term} {* {x {n AddOp} {n Term}}}} mode value}
352 Factor {is {/ {x {t (} {n Expression} {t )}} {n Number}} mode value}
353 MulOp {is {/ {t *} {t /}} mode value}
354 Number {is {x {? {n Sign}} {+ {n Digit}}} mode value}
355 Sign {is {/ {t -} {t +}} mode value}
356 Term {is {x {n Factor} {* {x {n MulOp} {n Factor}}}} mode value}
357 }
358 start {n Expression}
359 }
360
361
363 Here we specify the format used by the Parser Tools to serialize Pars‐
364 ing Expressions as immutable values for transport, comparison, etc.
365
366 We distinguish between regular and canonical serializations. While a
367 parsing expression may have more than one regular serialization only
368 exactly one of them will be canonical.
369
370 Regular serialization
371
372 Atomic Parsing Expressions
373
374 [1] The string epsilon is an atomic parsing expres‐
375 sion. It matches the empty string.
376
377 [2] The string dot is an atomic parsing expression. It
378 matches any character.
379
380 [3] The string alnum is an atomic parsing expression.
381 It matches any Unicode alphabet or digit charac‐
382 ter. This is a custom extension of PEs based on
383 Tcl's builtin command string is.
384
385 [4] The string alpha is an atomic parsing expression.
386 It matches any Unicode alphabet character. This is
387 a custom extension of PEs based on Tcl's builtin
388 command string is.
389
390 [5] The string ascii is an atomic parsing expression.
391 It matches any Unicode character below U0080. This
392 is a custom extension of PEs based on Tcl's
393 builtin command string is.
394
395 [6] The string control is an atomic parsing expres‐
396 sion. It matches any Unicode control character.
397 This is a custom extension of PEs based on Tcl's
398 builtin command string is.
399
400 [7] The string digit is an atomic parsing expression.
401 It matches any Unicode digit character. Note that
402 this includes characters outside of the [0..9]
403 range. This is a custom extension of PEs based on
404 Tcl's builtin command string is.
405
406 [8] The string graph is an atomic parsing expression.
407 It matches any Unicode printing character, except
408 for space. This is a custom extension of PEs based
409 on Tcl's builtin command string is.
410
411 [9] The string lower is an atomic parsing expression.
412 It matches any Unicode lower-case alphabet charac‐
413 ter. This is a custom extension of PEs based on
414 Tcl's builtin command string is.
415
416 [10] The string print is an atomic parsing expression.
417 It matches any Unicode printing character, includ‐
418 ing space. This is a custom extension of PEs based
419 on Tcl's builtin command string is.
420
421 [11] The string punct is an atomic parsing expression.
422 It matches any Unicode punctuation character. This
423 is a custom extension of PEs based on Tcl's
424 builtin command string is.
425
426 [12] The string space is an atomic parsing expression.
427 It matches any Unicode space character. This is a
428 custom extension of PEs based on Tcl's builtin
429 command string is.
430
431 [13] The string upper is an atomic parsing expression.
432 It matches any Unicode upper-case alphabet charac‐
433 ter. This is a custom extension of PEs based on
434 Tcl's builtin command string is.
435
436 [14] The string wordchar is an atomic parsing expres‐
437 sion. It matches any Unicode word character. This
438 is any alphanumeric character (see alnum), and any
439 connector punctuation characters (e.g. under‐
440 score). This is a custom extension of PEs based on
441 Tcl's builtin command string is.
442
443 [15] The string xdigit is an atomic parsing expression.
444 It matches any hexadecimal digit character. This
445 is a custom extension of PEs based on Tcl's
446 builtin command string is.
447
448 [16] The string ddigit is an atomic parsing expression.
449 It matches any decimal digit character. This is a
450 custom extension of PEs based on Tcl's builtin
451 command regexp.
452
453 [17] The expression [list t x] is an atomic parsing
454 expression. It matches the terminal string x.
455
456 [18] The expression [list n A] is an atomic parsing
457 expression. It matches the nonterminal A.
458
459 Combined Parsing Expressions
460
461 [1] For parsing expressions e1, e2, ... the result of
462 [list / e1 e2 ... ] is a parsing expression as
463 well. This is the ordered choice, aka prioritized
464 choice.
465
466 [2] For parsing expressions e1, e2, ... the result of
467 [list x e1 e2 ... ] is a parsing expression as
468 well. This is the sequence.
469
470 [3] For a parsing expression e the result of [list *
471 e] is a parsing expression as well. This is the
472 kleene closure, describing zero or more repeti‐
473 tions.
474
475 [4] For a parsing expression e the result of [list +
476 e] is a parsing expression as well. This is the
477 positive kleene closure, describing one or more
478 repetitions.
479
480 [5] For a parsing expression e the result of [list &
481 e] is a parsing expression as well. This is the
482 and lookahead predicate.
483
484 [6] For a parsing expression e the result of [list !
485 e] is a parsing expression as well. This is the
486 not lookahead predicate.
487
488 [7] For a parsing expression e the result of [list ?
489 e] is a parsing expression as well. This is the
490 optional input.
491
492 Canonical serialization
493 The canonical serialization of a parsing expression has the for‐
494 mat as specified in the previous item, and then additionally
495 satisfies the constraints below, which make it unique among all
496 the possible serializations of this parsing expression.
497
498 [1] The string representation of the value is the canonical
499 representation of a pure Tcl list. I.e. it does not con‐
500 tain superfluous whitespace.
501
502 [2] Terminals are not encoded as ranges (where start and end
503 of the range are identical).
504
505 EXAMPLE
506 Assuming the parsing expression shown on the right-hand side of the
507 rule
508
509 Expression <- Term (AddOp Term)*
510
511
512 then its canonical serialization (except for whitespace) is
513
514 {x {n Term} {* {x {n AddOp} {n Term}}}}
515
516
518 This document, and the package it describes, will undoubtedly contain
519 bugs and other problems. Please report such in the category pt of the
520 Tcllib Trackers [http://core.tcl.tk/tcllib/reportlist]. Please also
521 report any ideas for enhancements you may have for either package
522 and/or documentation.
523
524 When proposing code changes, please provide unified diffs, i.e the out‐
525 put of diff -u.
526
527 Note further that attachments are strongly preferred over inlined
528 patches. Attachments can be made by going to the Edit form of the
529 ticket immediately after its creation, and then using the left-most
530 button in the secondary navigation bar.
531
533 EBNF, JSON, LL(k), PEG, TDPL, context-free languages, export, expres‐
534 sion, grammar, matching, parser, parsing expression, parsing expression
535 grammar, plugin, push down automaton, recursive descent, serialization,
536 state, top-down parsing languages, transducer
537
539 Parsing and Grammars
540
542 Copyright (c) 2009 Andreas Kupries <andreas_kupries@users.sourceforge.net>
543
544
545
546
547tcllib 1 pt::peg::export::json(n)