1pt::peg::to::json(n)             Parser Tools             pt::peg::to::json(n)
2
3
4
5______________________________________________________________________________
6

NAME

8       pt::peg::to::json - PEG Conversion. Write JSON format
9

SYNOPSIS

11       package require Tcl  8.5
12
13       package require pt::peg::to::json  ?1?
14
15       package require pt::peg
16
17       package require json::write
18
19       pt::peg::to::json reset
20
21       pt::peg::to::json configure
22
23       pt::peg::to::json configure option
24
25       pt::peg::to::json configure option value...
26
27       pt::peg::to::json convert serial
28
29______________________________________________________________________________
30

DESCRIPTION

32       Are  you  lost ?  Do you have trouble understanding this document ?  In
33       that case please read the overview  provided  by  the  Introduction  to
34       Parser  Tools.  This document is the entrypoint to the whole system the
35       current package is a part of.
36
37       This package implements the converter from parsing expression  grammars
38       to JSON markup.
39
40       It resides in the Export section of the Core Layer of Parser Tools, and
41       can be used either directly with the other packages of this  layer,  or
42       indirectly  through the export manager provided by pt::peg::export. The
43       latter is intented for use in untrusted environments and  done  through
44       the  corresponding  export plugin pt::peg::export::json sitting between
45       converter and export manager.
46
47       IMAGE: arch_core_eplugins
48

API

50       The API provided by this package satisfies  the  specification  of  the
51       Converter API found in the Parser Tools Export API specification.
52
53       pt::peg::to::json reset
54              This  command resets the configuration of the package to its de‐
55              fault settings.
56
57       pt::peg::to::json configure
58              This command returns a dictionary containing the current config‐
59              uration of the package.
60
61       pt::peg::to::json configure option
62              This command returns the current value of the specified configu‐
63              ration option of the package. For  the  set  of  legal  options,
64              please read the section Options.
65
66       pt::peg::to::json configure option value...
67              This  command  sets the given configuration options of the pack‐
68              age, to the specified values. For  the  set  of  legal  options,
69              please read the section Options.
70
71       pt::peg::to::json convert serial
72              This  command takes the canonical serialization of a parsing ex‐
73              pression grammar, as specified in section PEG serialization for‐
74              mat, and contained in serial, and generates JSON markup encoding
75              the grammar, per the current package configuration.  The created
76              string is then returned as the result of the command.
77

OPTIONS

79       The  converter  to the JSON grammar exchange format recognizes the fol‐
80       lowing configuration variables and changes its behaviour as they  spec‐
81       ify.
82
83       -file string
84              The value of this option is the name of the file or other entity
85              from which the grammar came, for which the command is  run.  The
86              default value is unknown.
87
88       -name string
89              The  value of this option is the name of the grammar we are pro‐
90              cessing.  The default value is a_pe_grammar.
91
92       -user string
93              The value of this option is the name of the user for  which  the
94              command is run. The default value is unknown.
95
96       -indented boolean
97              If  this  option is set the system will break the generated JSON
98              across lines and indent it according  to  its  inner  structure,
99              with each key of a dictionary on a separate line.
100
101              If  the  option  is not set (the default), the whole JSON object
102              will be written on a single line, with minimum  spacing  between
103              all elements.
104
105       -aligned boolean
106              If this option is set the system will ensure that the values for
107              the keys in a dictionary are vertically aligned with each other,
108              for  a  nice  table effect.  To make this work this also implies
109              that -indented is set.
110
111              If the option is not set (the default), the output is  formatted
112              as per the value of indented, without trying to align the values
113              for dictionary keys.
114

JSON GRAMMAR EXCHANGE FORMAT

116       The json format for parsing expression grammars was written as  a  data
117       exchange  format not bound to Tcl. It was defined to allow the exchange
118       of grammars with PackRat/PEG based parser  generators  for  other  lan‐
119       guages.
120
121       It is formally specified by the rules below:
122
123       [1]    The JSON of any PEG is a JSON object.
124
125       [2]    This object holds a single key, pt::grammar::peg, and its value.
126              This value holds the contents of the grammar.
127
128       [3]    The contents of the grammar are a JSON object holding the set of
129              nonterminal  symbols  and  the starting expression. The relevant
130              keys and their values are
131
132              rules  The value is a JSON object whose keys are  the  names  of
133                     the nonterminal symbols known to the grammar.
134
135                     [1]    Each nonterminal symbol may occur only once.
136
137                     [2]    The  empty  string is not a legal nonterminal sym‐
138                            bol.
139
140                     [3]    The value for each symbol is a JSON object itself.
141                            The relevant keys and their values in this dictio‐
142                            nary are
143
144                            is     The value is a JSON string holding the  Tcl
145                                   serialization of the parsing expression de‐
146                                   scribing the symbols sentennial  structure,
147                                   as  specified  in the section PE serializa‐
148                                   tion format.
149
150                            mode   The value is a JSON holding holding one  of
151                                   three values specifying how a parser should
152                                   handle the semantic value produced  by  the
153                                   symbol.
154
155                                   value  The  semantic value of the nontermi‐
156                                          nal symbol  is  an  abstract  syntax
157                                          tree  consisting  of  a  single node
158                                          node  for  the  nonterminal  itself,
159                                          which  has  the ASTs of the symbol's
160                                          right hand side as its children.
161
162                                   leaf   The semantic value of the  nontermi‐
163                                          nal  symbol  is  an  abstract syntax
164                                          tree consisting  of  a  single  node
165                                          node  for  the  nonterminal, without
166                                          any children. Any ASTs generated  by
167                                          the  symbol's  right  hand  side are
168                                          discarded.
169
170                                   void   The  nonterminal  has  no   semantic
171                                          value.  Any  ASTs  generated  by the
172                                          symbol's right hand  side  are  dis‐
173                                          carded (as well).
174
175              start  The  value is a JSON string holding the Tcl serialization
176                     of the start parsing expression of the grammar, as speci‐
177                     fied in the section PE serialization format.
178
179       [4]    The  terminal symbols of the grammar are specified implicitly as
180              the set of all terminal symbols used in the start expression and
181              on the RHS of the grammar rules.
182
183       As an aside to the advanced reader, this is pretty much the same as the
184       Tcl serialization of PE grammars, as specified in section  PEG  serial‐
185       ization format, except that the Tcl dictionaries and lists of that for‐
186       mat are mapped to JSON objects and arrays. Only the parsing expressions
187       themselves  are  not  translated further, but kept as JSON strings con‐
188       taining a nested Tcl list, and there is no concept  of  canonicity  for
189       the JSON either.
190
191   EXAMPLE
192       Assuming the following PEG for simple mathematical expressions
193
194              PEG calculator (Expression)
195                  Digit      <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9'       ;
196                  Sign       <- '-' / '+'                                     ;
197                  Number     <- Sign? Digit+                                  ;
198                  Expression <- Term (AddOp Term)*                            ;
199                  MulOp      <- '*' / '/'                                     ;
200                  Term       <- Factor (MulOp Factor)*                        ;
201                  AddOp      <- '+'/'-'                                       ;
202                  Factor     <- '(' Expression ')' / Number                   ;
203              END;
204
205
206       a JSON serialization for it is
207
208              {
209                  "pt::grammar::peg" : {
210                      "rules" : {
211                          "AddOp"     : {
212                              "is"   : "\/ {t -} {t +}",
213                              "mode" : "value"
214                          },
215                          "Digit"     : {
216                              "is"   : "\/ {t 0} {t 1} {t 2} {t 3} {t 4} {t 5} {t 6} {t 7} {t 8} {t 9}",
217                              "mode" : "value"
218                          },
219                          "Expression" : {
220                              "is"   : "\/ {x {t (} {n Expression} {t )}} {x {n Factor} {* {x {n MulOp} {n Factor}}}}",
221                              "mode" : "value"
222                          },
223                          "Factor"    : {
224                              "is"   : "x {n Term} {* {x {n AddOp} {n Term}}}",
225                              "mode" : "value"
226                          },
227                          "MulOp"     : {
228                              "is"   : "\/ {t *} {t \/}",
229                              "mode" : "value"
230                          },
231                          "Number"    : {
232                              "is"   : "x {? {n Sign}} {+ {n Digit}}",
233                              "mode" : "value"
234                          },
235                          "Sign"      : {
236                              "is"   : "\/ {t -} {t +}",
237                              "mode" : "value"
238                          },
239                          "Term"      : {
240                              "is"   : "n Number",
241                              "mode" : "value"
242                          }
243                      },
244                      "start" : "n Expression"
245                  }
246              }
247
248
249       and a Tcl serialization of the same is
250
251              pt::grammar::peg {
252                  rules {
253                      AddOp      {is {/ {t -} {t +}}                                                                mode value}
254                      Digit      {is {/ {t 0} {t 1} {t 2} {t 3} {t 4} {t 5} {t 6} {t 7} {t 8} {t 9}}                mode value}
255                      Expression {is {x {n Term} {* {x {n AddOp} {n Term}}}}                                        mode value}
256                      Factor     {is {/ {x {t (} {n Expression} {t )}} {n Number}}                                  mode value}
257                      MulOp      {is {/ {t *} {t /}}                                                                mode value}
258                      Number     {is {x {? {n Sign}} {+ {n Digit}}}                                                 mode value}
259                      Sign       {is {/ {t -} {t +}}                                                                mode value}
260                      Term       {is {x {n Factor} {* {x {n MulOp} {n Factor}}}}                                    mode value}
261                  }
262                  start {n Expression}
263              }
264
265
266       The similarity of the latter to the JSON should be quite obvious.
267

PEG SERIALIZATION FORMAT

269       Here  we specify the format used by the Parser Tools to serialize Pars‐
270       ing Expression Grammars as immutable values for transport,  comparison,
271       etc.
272
273       We  distinguish  between regular and canonical serializations.  While a
274       PEG may have more than one regular serialization only  exactly  one  of
275       them will be canonical.
276
277       regular serialization
278
279              [1]    The serialization of any PEG is a nested Tcl dictionary.
280
281              [2]    This dictionary holds a single key, pt::grammar::peg, and
282                     its value. This value holds the contents of the grammar.
283
284              [3]    The contents of the grammar are a Tcl dictionary  holding
285                     the  set  of nonterminal symbols and the starting expres‐
286                     sion. The relevant keys and their values are
287
288                     rules  The value is a Tcl dictionary whose keys  are  the
289                            names  of  the  nonterminal  symbols  known to the
290                            grammar.
291
292                            [1]    Each  nonterminal  symbol  may  occur  only
293                                   once.
294
295                            [2]    The empty string is not a legal nonterminal
296                                   symbol.
297
298                            [3]    The value for each symbol is a Tcl  dictio‐
299                                   nary  itself.  The  relevant keys and their
300                                   values in this dictionary are
301
302                                   is     The value is  the  serialization  of
303                                          the  parsing  expression  describing
304                                          the symbols sentennial structure, as
305                                          specified  in the section PE serial‐
306                                          ization format.
307
308                                   mode   The value can be one of three values
309                                          specifying  how a parser should han‐
310                                          dle the semantic value  produced  by
311                                          the symbol.
312
313                                          value  The  semantic  value  of  the
314                                                 nonterminal symbol is an  ab‐
315                                                 stract syntax tree consisting
316                                                 of a single node node for the
317                                                 nonterminal itself, which has
318                                                 the  ASTs  of  the   symbol's
319                                                 right  hand side as its chil‐
320                                                 dren.
321
322                                          leaf   The  semantic  value  of  the
323                                                 nonterminal  symbol is an ab‐
324                                                 stract syntax tree consisting
325                                                 of a single node node for the
326                                                 nonterminal,   without    any
327                                                 children.  Any ASTs generated
328                                                 by the  symbol's  right  hand
329                                                 side are discarded.
330
331                                          void   The nonterminal has no seman‐
332                                                 tic value. Any ASTs generated
333                                                 by  the  symbol's  right hand
334                                                 side are discarded (as well).
335
336                     start  The value is the serialization of the start  pars‐
337                            ing expression of the grammar, as specified in the
338                            section PE serialization format.
339
340              [4]    The terminal symbols of the grammar are specified implic‐
341                     itly as the set of all terminal symbols used in the start
342                     expression and on the RHS of the grammar rules.
343
344       canonical serialization
345              The canonical serialization of a grammar has the format as spec‐
346              ified  in the previous item, and then additionally satisfies the
347              constraints below, which make it unique among all  the  possible
348              serializations of this grammar.
349
350              [1]    The  keys  found  in  all the nested Tcl dictionaries are
351                     sorted in ascending dictionary  order,  as  generated  by
352                     Tcl's builtin command lsort -increasing -dict.
353
354              [2]    The  string  representation of the value is the canonical
355                     representation of a Tcl dictionary. I.e. it does not con‐
356                     tain superfluous whitespace.
357
358   EXAMPLE
359       Assuming the following PEG for simple mathematical expressions
360
361              PEG calculator (Expression)
362                  Digit      <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9'       ;
363                  Sign       <- '-' / '+'                                     ;
364                  Number     <- Sign? Digit+                                  ;
365                  Expression <- Term (AddOp Term)*                            ;
366                  MulOp      <- '*' / '/'                                     ;
367                  Term       <- Factor (MulOp Factor)*                        ;
368                  AddOp      <- '+'/'-'                                       ;
369                  Factor     <- '(' Expression ')' / Number                   ;
370              END;
371
372
373       then its canonical serialization (except for whitespace) is
374
375              pt::grammar::peg {
376                  rules {
377                      AddOp      {is {/ {t -} {t +}}                                                                mode value}
378                      Digit      {is {/ {t 0} {t 1} {t 2} {t 3} {t 4} {t 5} {t 6} {t 7} {t 8} {t 9}}                mode value}
379                      Expression {is {x {n Term} {* {x {n AddOp} {n Term}}}}                                        mode value}
380                      Factor     {is {/ {x {t (} {n Expression} {t )}} {n Number}}                                  mode value}
381                      MulOp      {is {/ {t *} {t /}}                                                                mode value}
382                      Number     {is {x {? {n Sign}} {+ {n Digit}}}                                                 mode value}
383                      Sign       {is {/ {t -} {t +}}                                                                mode value}
384                      Term       {is {x {n Factor} {* {x {n MulOp} {n Factor}}}}                                    mode value}
385                  }
386                  start {n Expression}
387              }
388
389

PE SERIALIZATION FORMAT

391       Here  we specify the format used by the Parser Tools to serialize Pars‐
392       ing Expressions as immutable values for transport, comparison, etc.
393
394       We distinguish between regular and canonical serializations.   While  a
395       parsing  expression  may  have more than one regular serialization only
396       exactly one of them will be canonical.
397
398       Regular serialization
399
400              Atomic Parsing Expressions
401
402                     [1]    The string epsilon is an  atomic  parsing  expres‐
403                            sion. It matches the empty string.
404
405                     [2]    The string dot is an atomic parsing expression. It
406                            matches any character.
407
408                     [3]    The string alnum is an atomic parsing  expression.
409                            It  matches  any Unicode alphabet or digit charac‐
410                            ter. This is a custom extension of  PEs  based  on
411                            Tcl's builtin command string is.
412
413                     [4]    The  string alpha is an atomic parsing expression.
414                            It matches any Unicode alphabet character. This is
415                            a  custom  extension of PEs based on Tcl's builtin
416                            command string is.
417
418                     [5]    The string ascii is an atomic parsing  expression.
419                            It matches any Unicode character below U0080. This
420                            is a  custom  extension  of  PEs  based  on  Tcl's
421                            builtin command string is.
422
423                     [6]    The  string  control  is an atomic parsing expres‐
424                            sion. It matches any  Unicode  control  character.
425                            This  is  a custom extension of PEs based on Tcl's
426                            builtin command string is.
427
428                     [7]    The string digit is an atomic parsing  expression.
429                            It  matches any Unicode digit character. Note that
430                            this includes characters  outside  of  the  [0..9]
431                            range.  This is a custom extension of PEs based on
432                            Tcl's builtin command string is.
433
434                     [8]    The string graph is an atomic parsing  expression.
435                            It  matches any Unicode printing character, except
436                            for space. This is a custom extension of PEs based
437                            on Tcl's builtin command string is.
438
439                     [9]    The  string lower is an atomic parsing expression.
440                            It matches any Unicode lower-case alphabet charac‐
441                            ter.  This  is  a custom extension of PEs based on
442                            Tcl's builtin command string is.
443
444                     [10]   The string print is an atomic parsing  expression.
445                            It matches any Unicode printing character, includ‐
446                            ing space. This is a custom extension of PEs based
447                            on Tcl's builtin command string is.
448
449                     [11]   The  string punct is an atomic parsing expression.
450                            It matches any Unicode punctuation character. This
451                            is  a  custom  extension  of  PEs  based  on Tcl's
452                            builtin command string is.
453
454                     [12]   The string space is an atomic parsing  expression.
455                            It  matches any Unicode space character. This is a
456                            custom extension of PEs  based  on  Tcl's  builtin
457                            command string is.
458
459                     [13]   The  string upper is an atomic parsing expression.
460                            It matches any Unicode upper-case alphabet charac‐
461                            ter.  This  is  a custom extension of PEs based on
462                            Tcl's builtin command string is.
463
464                     [14]   The string wordchar is an atomic  parsing  expres‐
465                            sion.  It matches any Unicode word character. This
466                            is any alphanumeric character (see alnum), and any
467                            connector  punctuation  characters  (e.g.   under‐
468                            score). This is a custom extension of PEs based on
469                            Tcl's builtin command string is.
470
471                     [15]   The string xdigit is an atomic parsing expression.
472                            It matches any hexadecimal digit  character.  This
473                            is  a  custom  extension  of  PEs  based  on Tcl's
474                            builtin command string is.
475
476                     [16]   The string ddigit is an atomic parsing expression.
477                            It  matches any decimal digit character. This is a
478                            custom extension of PEs  based  on  Tcl's  builtin
479                            command regexp.
480
481                     [17]   The expression [list t x] is an atomic parsing ex‐
482                            pression. It matches the terminal string x.
483
484                     [18]   The expression [list n A] is an atomic parsing ex‐
485                            pression. It matches the nonterminal A.
486
487              Combined Parsing Expressions
488
489                     [1]    For  parsing expressions e1, e2, ... the result of
490                            [list / e1 e2 ... ] is  a  parsing  expression  as
491                            well.  This is the ordered choice, aka prioritized
492                            choice.
493
494                     [2]    For parsing expressions e1, e2, ... the result  of
495                            [list  x  e1  e2  ... ] is a parsing expression as
496                            well.  This is the sequence.
497
498                     [3]    For a parsing expression e the result of  [list  *
499                            e]  is  a parsing expression as well.  This is the
500                            kleene closure, describing zero  or  more  repeti‐
501                            tions.
502
503                     [4]    For  a  parsing expression e the result of [list +
504                            e] is a parsing expression as well.  This  is  the
505                            positive  kleene  closure,  describing one or more
506                            repetitions.
507
508                     [5]    For a parsing expression e the result of  [list  &
509                            e]  is  a parsing expression as well.  This is the
510                            and lookahead predicate.
511
512                     [6]    For a parsing expression e the result of  [list  !
513                            e]  is  a parsing expression as well.  This is the
514                            not lookahead predicate.
515
516                     [7]    For a parsing expression e the result of  [list  ?
517                            e]  is  a parsing expression as well.  This is the
518                            optional input.
519
520       Canonical serialization
521              The canonical serialization of a parsing expression has the for‐
522              mat  as  specified  in  the previous item, and then additionally
523              satisfies the constraints below, which make it unique among  all
524              the possible serializations of this parsing expression.
525
526              [1]    The  string  representation of the value is the canonical
527                     representation of a pure Tcl list. I.e. it does not  con‐
528                     tain superfluous whitespace.
529
530              [2]    Terminals  are not encoded as ranges (where start and end
531                     of the range are identical).
532
533   EXAMPLE
534       Assuming the parsing expression shown on the  right-hand  side  of  the
535       rule
536
537                  Expression <- Term (AddOp Term)*
538
539
540       then its canonical serialization (except for whitespace) is
541
542                  {x {n Term} {* {x {n AddOp} {n Term}}}}
543
544

BUGS, IDEAS, FEEDBACK

546       This  document,  and the package it describes, will undoubtedly contain
547       bugs and other problems.  Please report such in the category pt of  the
548       Tcllib  Trackers  [http://core.tcl.tk/tcllib/reportlist].   Please also
549       report any ideas for enhancements  you  may  have  for  either  package
550       and/or documentation.
551
552       When proposing code changes, please provide unified diffs, i.e the out‐
553       put of diff -u.
554
555       Note further that  attachments  are  strongly  preferred  over  inlined
556       patches.  Attachments  can  be  made  by  going to the Edit form of the
557       ticket immediately after its creation, and  then  using  the  left-most
558       button in the secondary navigation bar.
559

KEYWORDS

561       EBNF,  JSON,  LL(k), PEG, TDPL, context-free languages, conversion, ex‐
562       pression, format conversion, grammar, matching, parser, parsing expres‐
563       sion,  parsing  expression  grammar, push down automaton, recursive de‐
564       scent, serialization, state, top-down parsing languages, transducer
565

CATEGORY

567       Parsing and Grammars
568
570       Copyright (c) 2009 Andreas Kupries <andreas_kupries@users.sourceforge.net>
571
572
573
574
575tcllib                                 1                  pt::peg::to::json(n)
Impressum