1pt::peg::to::json(n) Parser Tools pt::peg::to::json(n)
2
3
4
5______________________________________________________________________________
6
8 pt::peg::to::json - PEG Conversion. Write JSON format
9
11 package require Tcl 8.5
12
13 package require pt::peg::to::json ?1?
14
15 package require pt::peg
16
17 package require json::write
18
19 pt::peg::to::json reset
20
21 pt::peg::to::json configure
22
23 pt::peg::to::json configure option
24
25 pt::peg::to::json configure option value...
26
27 pt::peg::to::json convert serial
28
29______________________________________________________________________________
30
32 Are you lost ? Do you have trouble understanding this document ? In
33 that case please read the overview provided by the Introduction to
34 Parser Tools. This document is the entrypoint to the whole system the
35 current package is a part of.
36
37 This package implements the converter from parsing expression grammars
38 to JSON markup.
39
40 It resides in the Export section of the Core Layer of Parser Tools, and
41 can be used either directly with the other packages of this layer, or
42 indirectly through the export manager provided by pt::peg::export. The
43 latter is intented for use in untrusted environments and done through
44 the corresponding export plugin pt::peg::export::json sitting between
45 converter and export manager.
46
47 IMAGE: arch_core_eplugins
48
50 The API provided by this package satisfies the specification of the
51 Converter API found in the Parser Tools Export API specification.
52
53 pt::peg::to::json reset
54 This command resets the configuration of the package to its de‐
55 fault settings.
56
57 pt::peg::to::json configure
58 This command returns a dictionary containing the current config‐
59 uration of the package.
60
61 pt::peg::to::json configure option
62 This command returns the current value of the specified configu‐
63 ration option of the package. For the set of legal options,
64 please read the section Options.
65
66 pt::peg::to::json configure option value...
67 This command sets the given configuration options of the pack‐
68 age, to the specified values. For the set of legal options,
69 please read the section Options.
70
71 pt::peg::to::json convert serial
72 This command takes the canonical serialization of a parsing ex‐
73 pression grammar, as specified in section PEG serialization for‐
74 mat, and contained in serial, and generates JSON markup encoding
75 the grammar, per the current package configuration. The created
76 string is then returned as the result of the command.
77
79 The converter to the JSON grammar exchange format recognizes the fol‐
80 lowing configuration variables and changes its behaviour as they spec‐
81 ify.
82
83 -file string
84 The value of this option is the name of the file or other entity
85 from which the grammar came, for which the command is run. The
86 default value is unknown.
87
88 -name string
89 The value of this option is the name of the grammar we are pro‐
90 cessing. The default value is a_pe_grammar.
91
92 -user string
93 The value of this option is the name of the user for which the
94 command is run. The default value is unknown.
95
96 -indented boolean
97 If this option is set the system will break the generated JSON
98 across lines and indent it according to its inner structure,
99 with each key of a dictionary on a separate line.
100
101 If the option is not set (the default), the whole JSON object
102 will be written on a single line, with minimum spacing between
103 all elements.
104
105 -aligned boolean
106 If this option is set the system will ensure that the values for
107 the keys in a dictionary are vertically aligned with each other,
108 for a nice table effect. To make this work this also implies
109 that -indented is set.
110
111 If the option is not set (the default), the output is formatted
112 as per the value of indented, without trying to align the values
113 for dictionary keys.
114
116 The json format for parsing expression grammars was written as a data
117 exchange format not bound to Tcl. It was defined to allow the exchange
118 of grammars with PackRat/PEG based parser generators for other lan‐
119 guages.
120
121 It is formally specified by the rules below:
122
123 [1] The JSON of any PEG is a JSON object.
124
125 [2] This object holds a single key, pt::grammar::peg, and its value.
126 This value holds the contents of the grammar.
127
128 [3] The contents of the grammar are a JSON object holding the set of
129 nonterminal symbols and the starting expression. The relevant
130 keys and their values are
131
132 rules The value is a JSON object whose keys are the names of
133 the nonterminal symbols known to the grammar.
134
135 [1] Each nonterminal symbol may occur only once.
136
137 [2] The empty string is not a legal nonterminal sym‐
138 bol.
139
140 [3] The value for each symbol is a JSON object itself.
141 The relevant keys and their values in this dictio‐
142 nary are
143
144 is The value is a JSON string holding the Tcl
145 serialization of the parsing expression de‐
146 scribing the symbols sentennial structure,
147 as specified in the section PE serializa‐
148 tion format.
149
150 mode The value is a JSON holding holding one of
151 three values specifying how a parser should
152 handle the semantic value produced by the
153 symbol.
154
155 value The semantic value of the nontermi‐
156 nal symbol is an abstract syntax
157 tree consisting of a single node
158 node for the nonterminal itself,
159 which has the ASTs of the symbol's
160 right hand side as its children.
161
162 leaf The semantic value of the nontermi‐
163 nal symbol is an abstract syntax
164 tree consisting of a single node
165 node for the nonterminal, without
166 any children. Any ASTs generated by
167 the symbol's right hand side are
168 discarded.
169
170 void The nonterminal has no semantic
171 value. Any ASTs generated by the
172 symbol's right hand side are dis‐
173 carded (as well).
174
175 start The value is a JSON string holding the Tcl serialization
176 of the start parsing expression of the grammar, as speci‐
177 fied in the section PE serialization format.
178
179 [4] The terminal symbols of the grammar are specified implicitly as
180 the set of all terminal symbols used in the start expression and
181 on the RHS of the grammar rules.
182
183 As an aside to the advanced reader, this is pretty much the same as the
184 Tcl serialization of PE grammars, as specified in section PEG serial‐
185 ization format, except that the Tcl dictionaries and lists of that for‐
186 mat are mapped to JSON objects and arrays. Only the parsing expressions
187 themselves are not translated further, but kept as JSON strings con‐
188 taining a nested Tcl list, and there is no concept of canonicity for
189 the JSON either.
190
191 EXAMPLE
192 Assuming the following PEG for simple mathematical expressions
193
194 PEG calculator (Expression)
195 Digit <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9' ;
196 Sign <- '-' / '+' ;
197 Number <- Sign? Digit+ ;
198 Expression <- Term (AddOp Term)* ;
199 MulOp <- '*' / '/' ;
200 Term <- Factor (MulOp Factor)* ;
201 AddOp <- '+'/'-' ;
202 Factor <- '(' Expression ')' / Number ;
203 END;
204
205
206 a JSON serialization for it is
207
208 {
209 "pt::grammar::peg" : {
210 "rules" : {
211 "AddOp" : {
212 "is" : "\/ {t -} {t +}",
213 "mode" : "value"
214 },
215 "Digit" : {
216 "is" : "\/ {t 0} {t 1} {t 2} {t 3} {t 4} {t 5} {t 6} {t 7} {t 8} {t 9}",
217 "mode" : "value"
218 },
219 "Expression" : {
220 "is" : "\/ {x {t (} {n Expression} {t )}} {x {n Factor} {* {x {n MulOp} {n Factor}}}}",
221 "mode" : "value"
222 },
223 "Factor" : {
224 "is" : "x {n Term} {* {x {n AddOp} {n Term}}}",
225 "mode" : "value"
226 },
227 "MulOp" : {
228 "is" : "\/ {t *} {t \/}",
229 "mode" : "value"
230 },
231 "Number" : {
232 "is" : "x {? {n Sign}} {+ {n Digit}}",
233 "mode" : "value"
234 },
235 "Sign" : {
236 "is" : "\/ {t -} {t +}",
237 "mode" : "value"
238 },
239 "Term" : {
240 "is" : "n Number",
241 "mode" : "value"
242 }
243 },
244 "start" : "n Expression"
245 }
246 }
247
248
249 and a Tcl serialization of the same is
250
251 pt::grammar::peg {
252 rules {
253 AddOp {is {/ {t -} {t +}} mode value}
254 Digit {is {/ {t 0} {t 1} {t 2} {t 3} {t 4} {t 5} {t 6} {t 7} {t 8} {t 9}} mode value}
255 Expression {is {x {n Term} {* {x {n AddOp} {n Term}}}} mode value}
256 Factor {is {/ {x {t (} {n Expression} {t )}} {n Number}} mode value}
257 MulOp {is {/ {t *} {t /}} mode value}
258 Number {is {x {? {n Sign}} {+ {n Digit}}} mode value}
259 Sign {is {/ {t -} {t +}} mode value}
260 Term {is {x {n Factor} {* {x {n MulOp} {n Factor}}}} mode value}
261 }
262 start {n Expression}
263 }
264
265
266 The similarity of the latter to the JSON should be quite obvious.
267
269 Here we specify the format used by the Parser Tools to serialize Pars‐
270 ing Expression Grammars as immutable values for transport, comparison,
271 etc.
272
273 We distinguish between regular and canonical serializations. While a
274 PEG may have more than one regular serialization only exactly one of
275 them will be canonical.
276
277 regular serialization
278
279 [1] The serialization of any PEG is a nested Tcl dictionary.
280
281 [2] This dictionary holds a single key, pt::grammar::peg, and
282 its value. This value holds the contents of the grammar.
283
284 [3] The contents of the grammar are a Tcl dictionary holding
285 the set of nonterminal symbols and the starting expres‐
286 sion. The relevant keys and their values are
287
288 rules The value is a Tcl dictionary whose keys are the
289 names of the nonterminal symbols known to the
290 grammar.
291
292 [1] Each nonterminal symbol may occur only
293 once.
294
295 [2] The empty string is not a legal nonterminal
296 symbol.
297
298 [3] The value for each symbol is a Tcl dictio‐
299 nary itself. The relevant keys and their
300 values in this dictionary are
301
302 is The value is the serialization of
303 the parsing expression describing
304 the symbols sentennial structure, as
305 specified in the section PE serial‐
306 ization format.
307
308 mode The value can be one of three values
309 specifying how a parser should han‐
310 dle the semantic value produced by
311 the symbol.
312
313 value The semantic value of the
314 nonterminal symbol is an ab‐
315 stract syntax tree consisting
316 of a single node node for the
317 nonterminal itself, which has
318 the ASTs of the symbol's
319 right hand side as its chil‐
320 dren.
321
322 leaf The semantic value of the
323 nonterminal symbol is an ab‐
324 stract syntax tree consisting
325 of a single node node for the
326 nonterminal, without any
327 children. Any ASTs generated
328 by the symbol's right hand
329 side are discarded.
330
331 void The nonterminal has no seman‐
332 tic value. Any ASTs generated
333 by the symbol's right hand
334 side are discarded (as well).
335
336 start The value is the serialization of the start pars‐
337 ing expression of the grammar, as specified in the
338 section PE serialization format.
339
340 [4] The terminal symbols of the grammar are specified implic‐
341 itly as the set of all terminal symbols used in the start
342 expression and on the RHS of the grammar rules.
343
344 canonical serialization
345 The canonical serialization of a grammar has the format as spec‐
346 ified in the previous item, and then additionally satisfies the
347 constraints below, which make it unique among all the possible
348 serializations of this grammar.
349
350 [1] The keys found in all the nested Tcl dictionaries are
351 sorted in ascending dictionary order, as generated by
352 Tcl's builtin command lsort -increasing -dict.
353
354 [2] The string representation of the value is the canonical
355 representation of a Tcl dictionary. I.e. it does not con‐
356 tain superfluous whitespace.
357
358 EXAMPLE
359 Assuming the following PEG for simple mathematical expressions
360
361 PEG calculator (Expression)
362 Digit <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9' ;
363 Sign <- '-' / '+' ;
364 Number <- Sign? Digit+ ;
365 Expression <- Term (AddOp Term)* ;
366 MulOp <- '*' / '/' ;
367 Term <- Factor (MulOp Factor)* ;
368 AddOp <- '+'/'-' ;
369 Factor <- '(' Expression ')' / Number ;
370 END;
371
372
373 then its canonical serialization (except for whitespace) is
374
375 pt::grammar::peg {
376 rules {
377 AddOp {is {/ {t -} {t +}} mode value}
378 Digit {is {/ {t 0} {t 1} {t 2} {t 3} {t 4} {t 5} {t 6} {t 7} {t 8} {t 9}} mode value}
379 Expression {is {x {n Term} {* {x {n AddOp} {n Term}}}} mode value}
380 Factor {is {/ {x {t (} {n Expression} {t )}} {n Number}} mode value}
381 MulOp {is {/ {t *} {t /}} mode value}
382 Number {is {x {? {n Sign}} {+ {n Digit}}} mode value}
383 Sign {is {/ {t -} {t +}} mode value}
384 Term {is {x {n Factor} {* {x {n MulOp} {n Factor}}}} mode value}
385 }
386 start {n Expression}
387 }
388
389
391 Here we specify the format used by the Parser Tools to serialize Pars‐
392 ing Expressions as immutable values for transport, comparison, etc.
393
394 We distinguish between regular and canonical serializations. While a
395 parsing expression may have more than one regular serialization only
396 exactly one of them will be canonical.
397
398 Regular serialization
399
400 Atomic Parsing Expressions
401
402 [1] The string epsilon is an atomic parsing expres‐
403 sion. It matches the empty string.
404
405 [2] The string dot is an atomic parsing expression. It
406 matches any character.
407
408 [3] The string alnum is an atomic parsing expression.
409 It matches any Unicode alphabet or digit charac‐
410 ter. This is a custom extension of PEs based on
411 Tcl's builtin command string is.
412
413 [4] The string alpha is an atomic parsing expression.
414 It matches any Unicode alphabet character. This is
415 a custom extension of PEs based on Tcl's builtin
416 command string is.
417
418 [5] The string ascii is an atomic parsing expression.
419 It matches any Unicode character below U0080. This
420 is a custom extension of PEs based on Tcl's
421 builtin command string is.
422
423 [6] The string control is an atomic parsing expres‐
424 sion. It matches any Unicode control character.
425 This is a custom extension of PEs based on Tcl's
426 builtin command string is.
427
428 [7] The string digit is an atomic parsing expression.
429 It matches any Unicode digit character. Note that
430 this includes characters outside of the [0..9]
431 range. This is a custom extension of PEs based on
432 Tcl's builtin command string is.
433
434 [8] The string graph is an atomic parsing expression.
435 It matches any Unicode printing character, except
436 for space. This is a custom extension of PEs based
437 on Tcl's builtin command string is.
438
439 [9] The string lower is an atomic parsing expression.
440 It matches any Unicode lower-case alphabet charac‐
441 ter. This is a custom extension of PEs based on
442 Tcl's builtin command string is.
443
444 [10] The string print is an atomic parsing expression.
445 It matches any Unicode printing character, includ‐
446 ing space. This is a custom extension of PEs based
447 on Tcl's builtin command string is.
448
449 [11] The string punct is an atomic parsing expression.
450 It matches any Unicode punctuation character. This
451 is a custom extension of PEs based on Tcl's
452 builtin command string is.
453
454 [12] The string space is an atomic parsing expression.
455 It matches any Unicode space character. This is a
456 custom extension of PEs based on Tcl's builtin
457 command string is.
458
459 [13] The string upper is an atomic parsing expression.
460 It matches any Unicode upper-case alphabet charac‐
461 ter. This is a custom extension of PEs based on
462 Tcl's builtin command string is.
463
464 [14] The string wordchar is an atomic parsing expres‐
465 sion. It matches any Unicode word character. This
466 is any alphanumeric character (see alnum), and any
467 connector punctuation characters (e.g. under‐
468 score). This is a custom extension of PEs based on
469 Tcl's builtin command string is.
470
471 [15] The string xdigit is an atomic parsing expression.
472 It matches any hexadecimal digit character. This
473 is a custom extension of PEs based on Tcl's
474 builtin command string is.
475
476 [16] The string ddigit is an atomic parsing expression.
477 It matches any decimal digit character. This is a
478 custom extension of PEs based on Tcl's builtin
479 command regexp.
480
481 [17] The expression [list t x] is an atomic parsing ex‐
482 pression. It matches the terminal string x.
483
484 [18] The expression [list n A] is an atomic parsing ex‐
485 pression. It matches the nonterminal A.
486
487 Combined Parsing Expressions
488
489 [1] For parsing expressions e1, e2, ... the result of
490 [list / e1 e2 ... ] is a parsing expression as
491 well. This is the ordered choice, aka prioritized
492 choice.
493
494 [2] For parsing expressions e1, e2, ... the result of
495 [list x e1 e2 ... ] is a parsing expression as
496 well. This is the sequence.
497
498 [3] For a parsing expression e the result of [list *
499 e] is a parsing expression as well. This is the
500 kleene closure, describing zero or more repeti‐
501 tions.
502
503 [4] For a parsing expression e the result of [list +
504 e] is a parsing expression as well. This is the
505 positive kleene closure, describing one or more
506 repetitions.
507
508 [5] For a parsing expression e the result of [list &
509 e] is a parsing expression as well. This is the
510 and lookahead predicate.
511
512 [6] For a parsing expression e the result of [list !
513 e] is a parsing expression as well. This is the
514 not lookahead predicate.
515
516 [7] For a parsing expression e the result of [list ?
517 e] is a parsing expression as well. This is the
518 optional input.
519
520 Canonical serialization
521 The canonical serialization of a parsing expression has the for‐
522 mat as specified in the previous item, and then additionally
523 satisfies the constraints below, which make it unique among all
524 the possible serializations of this parsing expression.
525
526 [1] The string representation of the value is the canonical
527 representation of a pure Tcl list. I.e. it does not con‐
528 tain superfluous whitespace.
529
530 [2] Terminals are not encoded as ranges (where start and end
531 of the range are identical).
532
533 EXAMPLE
534 Assuming the parsing expression shown on the right-hand side of the
535 rule
536
537 Expression <- Term (AddOp Term)*
538
539
540 then its canonical serialization (except for whitespace) is
541
542 {x {n Term} {* {x {n AddOp} {n Term}}}}
543
544
546 This document, and the package it describes, will undoubtedly contain
547 bugs and other problems. Please report such in the category pt of the
548 Tcllib Trackers [http://core.tcl.tk/tcllib/reportlist]. Please also
549 report any ideas for enhancements you may have for either package
550 and/or documentation.
551
552 When proposing code changes, please provide unified diffs, i.e the out‐
553 put of diff -u.
554
555 Note further that attachments are strongly preferred over inlined
556 patches. Attachments can be made by going to the Edit form of the
557 ticket immediately after its creation, and then using the left-most
558 button in the secondary navigation bar.
559
561 EBNF, JSON, LL(k), PEG, TDPL, context-free languages, conversion, ex‐
562 pression, format conversion, grammar, matching, parser, parsing expres‐
563 sion, parsing expression grammar, push down automaton, recursive de‐
564 scent, serialization, state, top-down parsing languages, transducer
565
567 Parsing and Grammars
568
570 Copyright (c) 2009 Andreas Kupries <andreas_kupries@users.sourceforge.net>
571
572
573
574
575tcllib 1 pt::peg::to::json(n)