1pt::peg::to::json(n) Parser Tools pt::peg::to::json(n)
2
3
4
5______________________________________________________________________________
6
8 pt::peg::to::json - PEG Conversion. Write JSON format
9
11 package require Tcl 8.5
12
13 package require pt::peg::to::json ?1?
14
15 package require pt::peg
16
17 package require json::write
18
19 pt::peg::to::json reset
20
21 pt::peg::to::json configure
22
23 pt::peg::to::json configure option
24
25 pt::peg::to::json configure option value...
26
27 pt::peg::to::json convert serial
28
29______________________________________________________________________________
30
32 Are you lost ? Do you have trouble understanding this document ? In
33 that case please read the overview provided by the Introduction to
34 Parser Tools. This document is the entrypoint to the whole system the
35 current package is a part of.
36
37 This package implements the converter from parsing expression grammars
38 to JSON markup.
39
40 It resides in the Export section of the Core Layer of Parser Tools, and
41 can be used either directly with the other packages of this layer, or
42 indirectly through the export manager provided by pt::peg::export. The
43 latter is intented for use in untrusted environments and done through
44 the corresponding export plugin pt::peg::export::json sitting between
45 converter and export manager.
46
47 IMAGE: arch_core_eplugins
48
50 The API provided by this package satisfies the specification of the
51 Converter API found in the Parser Tools Export API specification.
52
53 pt::peg::to::json reset
54 This command resets the configuration of the package to its
55 default settings.
56
57 pt::peg::to::json configure
58 This command returns a dictionary containing the current config‐
59 uration of the package.
60
61 pt::peg::to::json configure option
62 This command returns the current value of the specified configu‐
63 ration option of the package. For the set of legal options,
64 please read the section Options.
65
66 pt::peg::to::json configure option value...
67 This command sets the given configuration options of the pack‐
68 age, to the specified values. For the set of legal options,
69 please read the section Options.
70
71 pt::peg::to::json convert serial
72 This command takes the canonical serialization of a parsing
73 expression grammar, as specified in section PEG serialization
74 format, and contained in serial, and generates JSON markup
75 encoding the grammar, per the current package configuration.
76 The created string is then returned as the result of the com‐
77 mand.
78
80 The converter to the JSON grammar exchange format recognizes the fol‐
81 lowing configuration variables and changes its behaviour as they spec‐
82 ify.
83
84 -file string
85 The value of this option is the name of the file or other entity
86 from which the grammar came, for which the command is run. The
87 default value is unknown.
88
89 -name string
90 The value of this option is the name of the grammar we are pro‐
91 cessing. The default value is a_pe_grammar.
92
93 -user string
94 The value of this option is the name of the user for which the
95 command is run. The default value is unknown.
96
97 -indented boolean
98 If this option is set the system will break the generated JSON
99 across lines and indent it according to its inner structure,
100 with each key of a dictionary on a separate line.
101
102 If the option is not set (the default), the whole JSON object
103 will be written on a single line, with minimum spacing between
104 all elements.
105
106 -aligned boolean
107 If this option is set the system will ensure that the values for
108 the keys in a dictionary are vertically aligned with each other,
109 for a nice table effect. To make this work this also implies
110 that -indented is set.
111
112 If the option is not set (the default), the output is formatted
113 as per the value of indented, without trying to align the values
114 for dictionary keys.
115
117 The json format for parsing expression grammars was written as a data
118 exchange format not bound to Tcl. It was defined to allow the exchange
119 of grammars with PackRat/PEG based parser generators for other lan‐
120 guages.
121
122 It is formally specified by the rules below:
123
124 [1] The JSON of any PEG is a JSON object.
125
126 [2] This object holds a single key, pt::grammar::peg, and its value.
127 This value holds the contents of the grammar.
128
129 [3] The contents of the grammar are a JSON object holding the set of
130 nonterminal symbols and the starting expression. The relevant
131 keys and their values are
132
133 rules The value is a JSON object whose keys are the names of
134 the nonterminal symbols known to the grammar.
135
136 [1] Each nonterminal symbol may occur only once.
137
138 [2] The empty string is not a legal nonterminal sym‐
139 bol.
140
141 [3] The value for each symbol is a JSON object itself.
142 The relevant keys and their values in this dictio‐
143 nary are
144
145 is The value is a JSON string holding the Tcl
146 serialization of the parsing expression
147 describing the symbols sentennial struc‐
148 ture, as specified in the section PE seri‐
149 alization format.
150
151 mode The value is a JSON holding holding one of
152 three values specifying how a parser should
153 handle the semantic value produced by the
154 symbol.
155
156 value The semantic value of the nontermi‐
157 nal symbol is an abstract syntax
158 tree consisting of a single node
159 node for the nonterminal itself,
160 which has the ASTs of the symbol's
161 right hand side as its children.
162
163 leaf The semantic value of the nontermi‐
164 nal symbol is an abstract syntax
165 tree consisting of a single node
166 node for the nonterminal, without
167 any children. Any ASTs generated by
168 the symbol's right hand side are
169 discarded.
170
171 void The nonterminal has no semantic
172 value. Any ASTs generated by the
173 symbol's right hand side are dis‐
174 carded (as well).
175
176 start The value is a JSON string holding the Tcl serialization
177 of the start parsing expression of the grammar, as speci‐
178 fied in the section PE serialization format.
179
180 [4] The terminal symbols of the grammar are specified implicitly as
181 the set of all terminal symbols used in the start expression and
182 on the RHS of the grammar rules.
183
184 As an aside to the advanced reader, this is pretty much the same as the
185 Tcl serialization of PE grammars, as specified in section PEG serial‐
186 ization format, except that the Tcl dictionaries and lists of that for‐
187 mat are mapped to JSON objects and arrays. Only the parsing expressions
188 themselves are not translated further, but kept as JSON strings con‐
189 taining a nested Tcl list, and there is no concept of canonicity for
190 the JSON either.
191
192 EXAMPLE
193 Assuming the following PEG for simple mathematical expressions
194
195 PEG calculator (Expression)
196 Digit <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9' ;
197 Sign <- '-' / '+' ;
198 Number <- Sign? Digit+ ;
199 Expression <- Term (AddOp Term)* ;
200 MulOp <- '*' / '/' ;
201 Term <- Factor (MulOp Factor)* ;
202 AddOp <- '+'/'-' ;
203 Factor <- '(' Expression ')' / Number ;
204 END;
205
206
207 a JSON serialization for it is
208
209 {
210 "pt::grammar::peg" : {
211 "rules" : {
212 "AddOp" : {
213 "is" : "\/ {t -} {t +}",
214 "mode" : "value"
215 },
216 "Digit" : {
217 "is" : "\/ {t 0} {t 1} {t 2} {t 3} {t 4} {t 5} {t 6} {t 7} {t 8} {t 9}",
218 "mode" : "value"
219 },
220 "Expression" : {
221 "is" : "\/ {x {t (} {n Expression} {t )}} {x {n Factor} {* {x {n MulOp} {n Factor}}}}",
222 "mode" : "value"
223 },
224 "Factor" : {
225 "is" : "x {n Term} {* {x {n AddOp} {n Term}}}",
226 "mode" : "value"
227 },
228 "MulOp" : {
229 "is" : "\/ {t *} {t \/}",
230 "mode" : "value"
231 },
232 "Number" : {
233 "is" : "x {? {n Sign}} {+ {n Digit}}",
234 "mode" : "value"
235 },
236 "Sign" : {
237 "is" : "\/ {t -} {t +}",
238 "mode" : "value"
239 },
240 "Term" : {
241 "is" : "n Number",
242 "mode" : "value"
243 }
244 },
245 "start" : "n Expression"
246 }
247 }
248
249
250 and a Tcl serialization of the same is
251
252 pt::grammar::peg {
253 rules {
254 AddOp {is {/ {t -} {t +}} mode value}
255 Digit {is {/ {t 0} {t 1} {t 2} {t 3} {t 4} {t 5} {t 6} {t 7} {t 8} {t 9}} mode value}
256 Expression {is {x {n Term} {* {x {n AddOp} {n Term}}}} mode value}
257 Factor {is {/ {x {t (} {n Expression} {t )}} {n Number}} mode value}
258 MulOp {is {/ {t *} {t /}} mode value}
259 Number {is {x {? {n Sign}} {+ {n Digit}}} mode value}
260 Sign {is {/ {t -} {t +}} mode value}
261 Term {is {x {n Factor} {* {x {n MulOp} {n Factor}}}} mode value}
262 }
263 start {n Expression}
264 }
265
266
267 The similarity of the latter to the JSON should be quite obvious.
268
270 Here we specify the format used by the Parser Tools to serialize Pars‐
271 ing Expression Grammars as immutable values for transport, comparison,
272 etc.
273
274 We distinguish between regular and canonical serializations. While a
275 PEG may have more than one regular serialization only exactly one of
276 them will be canonical.
277
278 regular serialization
279
280 [1] The serialization of any PEG is a nested Tcl dictionary.
281
282 [2] This dictionary holds a single key, pt::grammar::peg, and
283 its value. This value holds the contents of the grammar.
284
285 [3] The contents of the grammar are a Tcl dictionary holding
286 the set of nonterminal symbols and the starting expres‐
287 sion. The relevant keys and their values are
288
289 rules The value is a Tcl dictionary whose keys are the
290 names of the nonterminal symbols known to the
291 grammar.
292
293 [1] Each nonterminal symbol may occur only
294 once.
295
296 [2] The empty string is not a legal nonterminal
297 symbol.
298
299 [3] The value for each symbol is a Tcl dictio‐
300 nary itself. The relevant keys and their
301 values in this dictionary are
302
303 is The value is the serialization of
304 the parsing expression describing
305 the symbols sentennial structure, as
306 specified in the section PE serial‐
307 ization format.
308
309 mode The value can be one of three values
310 specifying how a parser should han‐
311 dle the semantic value produced by
312 the symbol.
313
314 value The semantic value of the
315 nonterminal symbol is an
316 abstract syntax tree consist‐
317 ing of a single node node for
318 the nonterminal itself, which
319 has the ASTs of the symbol's
320 right hand side as its chil‐
321 dren.
322
323 leaf The semantic value of the
324 nonterminal symbol is an
325 abstract syntax tree consist‐
326 ing of a single node node for
327 the nonterminal, without any
328 children. Any ASTs generated
329 by the symbol's right hand
330 side are discarded.
331
332 void The nonterminal has no seman‐
333 tic value. Any ASTs generated
334 by the symbol's right hand
335 side are discarded (as well).
336
337 start The value is the serialization of the start pars‐
338 ing expression of the grammar, as specified in the
339 section PE serialization format.
340
341 [4] The terminal symbols of the grammar are specified implic‐
342 itly as the set of all terminal symbols used in the start
343 expression and on the RHS of the grammar rules.
344
345 canonical serialization
346 The canonical serialization of a grammar has the format as spec‐
347 ified in the previous item, and then additionally satisfies the
348 constraints below, which make it unique among all the possible
349 serializations of this grammar.
350
351 [1] The keys found in all the nested Tcl dictionaries are
352 sorted in ascending dictionary order, as generated by
353 Tcl's builtin command lsort -increasing -dict.
354
355 [2] The string representation of the value is the canonical
356 representation of a Tcl dictionary. I.e. it does not con‐
357 tain superfluous whitespace.
358
359 EXAMPLE
360 Assuming the following PEG for simple mathematical expressions
361
362 PEG calculator (Expression)
363 Digit <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9' ;
364 Sign <- '-' / '+' ;
365 Number <- Sign? Digit+ ;
366 Expression <- Term (AddOp Term)* ;
367 MulOp <- '*' / '/' ;
368 Term <- Factor (MulOp Factor)* ;
369 AddOp <- '+'/'-' ;
370 Factor <- '(' Expression ')' / Number ;
371 END;
372
373
374 then its canonical serialization (except for whitespace) is
375
376 pt::grammar::peg {
377 rules {
378 AddOp {is {/ {t -} {t +}} mode value}
379 Digit {is {/ {t 0} {t 1} {t 2} {t 3} {t 4} {t 5} {t 6} {t 7} {t 8} {t 9}} mode value}
380 Expression {is {x {n Term} {* {x {n AddOp} {n Term}}}} mode value}
381 Factor {is {/ {x {t (} {n Expression} {t )}} {n Number}} mode value}
382 MulOp {is {/ {t *} {t /}} mode value}
383 Number {is {x {? {n Sign}} {+ {n Digit}}} mode value}
384 Sign {is {/ {t -} {t +}} mode value}
385 Term {is {x {n Factor} {* {x {n MulOp} {n Factor}}}} mode value}
386 }
387 start {n Expression}
388 }
389
390
392 Here we specify the format used by the Parser Tools to serialize Pars‐
393 ing Expressions as immutable values for transport, comparison, etc.
394
395 We distinguish between regular and canonical serializations. While a
396 parsing expression may have more than one regular serialization only
397 exactly one of them will be canonical.
398
399 Regular serialization
400
401 Atomic Parsing Expressions
402
403 [1] The string epsilon is an atomic parsing expres‐
404 sion. It matches the empty string.
405
406 [2] The string dot is an atomic parsing expression. It
407 matches any character.
408
409 [3] The string alnum is an atomic parsing expression.
410 It matches any Unicode alphabet or digit charac‐
411 ter. This is a custom extension of PEs based on
412 Tcl's builtin command string is.
413
414 [4] The string alpha is an atomic parsing expression.
415 It matches any Unicode alphabet character. This is
416 a custom extension of PEs based on Tcl's builtin
417 command string is.
418
419 [5] The string ascii is an atomic parsing expression.
420 It matches any Unicode character below U0080. This
421 is a custom extension of PEs based on Tcl's
422 builtin command string is.
423
424 [6] The string control is an atomic parsing expres‐
425 sion. It matches any Unicode control character.
426 This is a custom extension of PEs based on Tcl's
427 builtin command string is.
428
429 [7] The string digit is an atomic parsing expression.
430 It matches any Unicode digit character. Note that
431 this includes characters outside of the [0..9]
432 range. This is a custom extension of PEs based on
433 Tcl's builtin command string is.
434
435 [8] The string graph is an atomic parsing expression.
436 It matches any Unicode printing character, except
437 for space. This is a custom extension of PEs based
438 on Tcl's builtin command string is.
439
440 [9] The string lower is an atomic parsing expression.
441 It matches any Unicode lower-case alphabet charac‐
442 ter. This is a custom extension of PEs based on
443 Tcl's builtin command string is.
444
445 [10] The string print is an atomic parsing expression.
446 It matches any Unicode printing character, includ‐
447 ing space. This is a custom extension of PEs based
448 on Tcl's builtin command string is.
449
450 [11] The string punct is an atomic parsing expression.
451 It matches any Unicode punctuation character. This
452 is a custom extension of PEs based on Tcl's
453 builtin command string is.
454
455 [12] The string space is an atomic parsing expression.
456 It matches any Unicode space character. This is a
457 custom extension of PEs based on Tcl's builtin
458 command string is.
459
460 [13] The string upper is an atomic parsing expression.
461 It matches any Unicode upper-case alphabet charac‐
462 ter. This is a custom extension of PEs based on
463 Tcl's builtin command string is.
464
465 [14] The string wordchar is an atomic parsing expres‐
466 sion. It matches any Unicode word character. This
467 is any alphanumeric character (see alnum), and any
468 connector punctuation characters (e.g. under‐
469 score). This is a custom extension of PEs based on
470 Tcl's builtin command string is.
471
472 [15] The string xdigit is an atomic parsing expression.
473 It matches any hexadecimal digit character. This
474 is a custom extension of PEs based on Tcl's
475 builtin command string is.
476
477 [16] The string ddigit is an atomic parsing expression.
478 It matches any decimal digit character. This is a
479 custom extension of PEs based on Tcl's builtin
480 command regexp.
481
482 [17] The expression [list t x] is an atomic parsing
483 expression. It matches the terminal string x.
484
485 [18] The expression [list n A] is an atomic parsing
486 expression. It matches the nonterminal A.
487
488 Combined Parsing Expressions
489
490 [1] For parsing expressions e1, e2, ... the result of
491 [list / e1 e2 ... ] is a parsing expression as
492 well. This is the ordered choice, aka prioritized
493 choice.
494
495 [2] For parsing expressions e1, e2, ... the result of
496 [list x e1 e2 ... ] is a parsing expression as
497 well. This is the sequence.
498
499 [3] For a parsing expression e the result of [list *
500 e] is a parsing expression as well. This is the
501 kleene closure, describing zero or more repeti‐
502 tions.
503
504 [4] For a parsing expression e the result of [list +
505 e] is a parsing expression as well. This is the
506 positive kleene closure, describing one or more
507 repetitions.
508
509 [5] For a parsing expression e the result of [list &
510 e] is a parsing expression as well. This is the
511 and lookahead predicate.
512
513 [6] For a parsing expression e the result of [list !
514 e] is a parsing expression as well. This is the
515 not lookahead predicate.
516
517 [7] For a parsing expression e the result of [list ?
518 e] is a parsing expression as well. This is the
519 optional input.
520
521 Canonical serialization
522 The canonical serialization of a parsing expression has the for‐
523 mat as specified in the previous item, and then additionally
524 satisfies the constraints below, which make it unique among all
525 the possible serializations of this parsing expression.
526
527 [1] The string representation of the value is the canonical
528 representation of a pure Tcl list. I.e. it does not con‐
529 tain superfluous whitespace.
530
531 [2] Terminals are not encoded as ranges (where start and end
532 of the range are identical).
533
534 EXAMPLE
535 Assuming the parsing expression shown on the right-hand side of the
536 rule
537
538 Expression <- Term (AddOp Term)*
539
540
541 then its canonical serialization (except for whitespace) is
542
543 {x {n Term} {* {x {n AddOp} {n Term}}}}
544
545
547 This document, and the package it describes, will undoubtedly contain
548 bugs and other problems. Please report such in the category pt of the
549 Tcllib Trackers [http://core.tcl.tk/tcllib/reportlist]. Please also
550 report any ideas for enhancements you may have for either package
551 and/or documentation.
552
553 When proposing code changes, please provide unified diffs, i.e the out‐
554 put of diff -u.
555
556 Note further that attachments are strongly preferred over inlined
557 patches. Attachments can be made by going to the Edit form of the
558 ticket immediately after its creation, and then using the left-most
559 button in the secondary navigation bar.
560
562 EBNF, JSON, LL(k), PEG, TDPL, context-free languages, conversion,
563 expression, format conversion, grammar, matching, parser, parsing
564 expression, parsing expression grammar, push down automaton, recursive
565 descent, serialization, state, top-down parsing languages, transducer
566
568 Parsing and Grammars
569
571 Copyright (c) 2009 Andreas Kupries <andreas_kupries@users.sourceforge.net>
572
573
574
575
576tcllib 1 pt::peg::to::json(n)