1pt::peg::to::json(n)             Parser Tools             pt::peg::to::json(n)
2
3
4
5______________________________________________________________________________
6

NAME

8       pt::peg::to::json - PEG Conversion. Write JSON format
9

SYNOPSIS

11       package require Tcl  8.5
12
13       package require pt::peg::to::json  ?1?
14
15       package require pt::peg
16
17       package require json::write
18
19       pt::peg::to::json reset
20
21       pt::peg::to::json configure
22
23       pt::peg::to::json configure option
24
25       pt::peg::to::json configure option value...
26
27       pt::peg::to::json convert serial
28
29______________________________________________________________________________
30

DESCRIPTION

32       Are  you  lost ?  Do you have trouble understanding this document ?  In
33       that case please read the overview  provided  by  the  Introduction  to
34       Parser  Tools.  This document is the entrypoint to the whole system the
35       current package is a part of.
36
37       This package implements the converter from parsing expression  grammars
38       to JSON markup.
39
40       It resides in the Export section of the Core Layer of Parser Tools, and
41       can be used either directly with the other packages of this  layer,  or
42       indirectly  through the export manager provided by pt::peg::export. The
43       latter is intented for use in untrusted environments and  done  through
44       the  corresponding  export plugin pt::peg::export::json sitting between
45       converter and export manager.
46
47       IMAGE: arch_core_eplugins
48

API

50       The API provided by this package satisfies  the  specification  of  the
51       Converter API found in the Parser Tools Export API specification.
52
53       pt::peg::to::json reset
54              This  command  resets  the  configuration  of the package to its
55              default settings.
56
57       pt::peg::to::json configure
58              This command returns a dictionary containing the current config‐
59              uration of the package.
60
61       pt::peg::to::json configure option
62              This command returns the current value of the specified configu‐
63              ration option of the package. For  the  set  of  legal  options,
64              please read the section Options.
65
66       pt::peg::to::json configure option value...
67              This  command  sets the given configuration options of the pack‐
68              age, to the specified values. For  the  set  of  legal  options,
69              please read the section Options.
70
71       pt::peg::to::json convert serial
72              This  command  takes  the  canonical  serialization of a parsing
73              expression grammar, as specified in  section  PEG  serialization
74              format,  and  contained  in  serial,  and  generates JSON markup
75              encoding the grammar, per  the  current  package  configuration.
76              The  created  string  is then returned as the result of the com‐
77              mand.
78

OPTIONS

80       The converter to the JSON grammar exchange format recognizes  the  fol‐
81       lowing  configuration variables and changes its behaviour as they spec‐
82       ify.
83
84       -file string
85              The value of this option is the name of the file or other entity
86              from  which  the grammar came, for which the command is run. The
87              default value is unknown.
88
89       -name string
90              The value of this option is the name of the grammar we are  pro‐
91              cessing.  The default value is a_pe_grammar.
92
93       -user string
94              The  value  of this option is the name of the user for which the
95              command is run. The default value is unknown.
96
97       -indented boolean
98              If this option is set the system will break the  generated  JSON
99              across  lines  and  indent  it according to its inner structure,
100              with each key of a dictionary on a separate line.
101
102              If the option is not set (the default), the  whole  JSON  object
103              will  be  written on a single line, with minimum spacing between
104              all elements.
105
106       -aligned boolean
107              If this option is set the system will ensure that the values for
108              the keys in a dictionary are vertically aligned with each other,
109              for a nice table effect.  To make this work  this  also  implies
110              that -indented is set.
111
112              If  the option is not set (the default), the output is formatted
113              as per the value of indented, without trying to align the values
114              for dictionary keys.
115

JSON GRAMMAR EXCHANGE FORMAT

117       The  json  format for parsing expression grammars was written as a data
118       exchange format not bound to Tcl. It was defined to allow the  exchange
119       of  grammars  with  PackRat/PEG  based parser generators for other lan‐
120       guages.
121
122       It is formally specified by the rules below:
123
124       [1]    The JSON of any PEG is a JSON object.
125
126       [2]    This object holds a single key, pt::grammar::peg, and its value.
127              This value holds the contents of the grammar.
128
129       [3]    The contents of the grammar are a JSON object holding the set of
130              nonterminal symbols and the starting  expression.  The  relevant
131              keys and their values are
132
133              rules  The  value  is  a JSON object whose keys are the names of
134                     the nonterminal symbols known to the grammar.
135
136                     [1]    Each nonterminal symbol may occur only once.
137
138                     [2]    The empty string is not a legal  nonterminal  sym‐
139                            bol.
140
141                     [3]    The value for each symbol is a JSON object itself.
142                            The relevant keys and their values in this dictio‐
143                            nary are
144
145                            is     The  value is a JSON string holding the Tcl
146                                   serialization  of  the  parsing  expression
147                                   describing  the  symbols  sentennial struc‐
148                                   ture, as specified in the section PE  seri‐
149                                   alization format.
150
151                            mode   The  value is a JSON holding holding one of
152                                   three values specifying how a parser should
153                                   handle  the  semantic value produced by the
154                                   symbol.
155
156                                   value  The semantic value of the  nontermi‐
157                                          nal  symbol  is  an  abstract syntax
158                                          tree consisting  of  a  single  node
159                                          node  for  the  nonterminal  itself,
160                                          which has the ASTs of  the  symbol's
161                                          right hand side as its children.
162
163                                   leaf   The  semantic value of the nontermi‐
164                                          nal symbol  is  an  abstract  syntax
165                                          tree  consisting  of  a  single node
166                                          node for  the  nonterminal,  without
167                                          any  children. Any ASTs generated by
168                                          the symbol's  right  hand  side  are
169                                          discarded.
170
171                                   void   The   nonterminal  has  no  semantic
172                                          value. Any  ASTs  generated  by  the
173                                          symbol's  right  hand  side are dis‐
174                                          carded (as well).
175
176              start  The value is a JSON string holding the Tcl  serialization
177                     of the start parsing expression of the grammar, as speci‐
178                     fied in the section PE serialization format.
179
180       [4]    The terminal symbols of the grammar are specified implicitly  as
181              the set of all terminal symbols used in the start expression and
182              on the RHS of the grammar rules.
183
184       As an aside to the advanced reader, this is pretty much the same as the
185       Tcl  serialization  of PE grammars, as specified in section PEG serial‐
186       ization format, except that the Tcl dictionaries and lists of that for‐
187       mat are mapped to JSON objects and arrays. Only the parsing expressions
188       themselves are not translated further, but kept as  JSON  strings  con‐
189       taining  a  nested  Tcl list, and there is no concept of canonicity for
190       the JSON either.
191
192   EXAMPLE
193       Assuming the following PEG for simple mathematical expressions
194
195              PEG calculator (Expression)
196                  Digit      <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9'       ;
197                  Sign       <- '-' / '+'                                     ;
198                  Number     <- Sign? Digit+                                  ;
199                  Expression <- Term (AddOp Term)*                            ;
200                  MulOp      <- '*' / '/'                                     ;
201                  Term       <- Factor (MulOp Factor)*                        ;
202                  AddOp      <- '+'/'-'                                       ;
203                  Factor     <- '(' Expression ')' / Number                   ;
204              END;
205
206
207       a JSON serialization for it is
208
209              {
210                  "pt::grammar::peg" : {
211                      "rules" : {
212                          "AddOp"     : {
213                              "is"   : "\/ {t -} {t +}",
214                              "mode" : "value"
215                          },
216                          "Digit"     : {
217                              "is"   : "\/ {t 0} {t 1} {t 2} {t 3} {t 4} {t 5} {t 6} {t 7} {t 8} {t 9}",
218                              "mode" : "value"
219                          },
220                          "Expression" : {
221                              "is"   : "\/ {x {t (} {n Expression} {t )}} {x {n Factor} {* {x {n MulOp} {n Factor}}}}",
222                              "mode" : "value"
223                          },
224                          "Factor"    : {
225                              "is"   : "x {n Term} {* {x {n AddOp} {n Term}}}",
226                              "mode" : "value"
227                          },
228                          "MulOp"     : {
229                              "is"   : "\/ {t *} {t \/}",
230                              "mode" : "value"
231                          },
232                          "Number"    : {
233                              "is"   : "x {? {n Sign}} {+ {n Digit}}",
234                              "mode" : "value"
235                          },
236                          "Sign"      : {
237                              "is"   : "\/ {t -} {t +}",
238                              "mode" : "value"
239                          },
240                          "Term"      : {
241                              "is"   : "n Number",
242                              "mode" : "value"
243                          }
244                      },
245                      "start" : "n Expression"
246                  }
247              }
248
249
250       and a Tcl serialization of the same is
251
252              pt::grammar::peg {
253                  rules {
254                      AddOp      {is {/ {t -} {t +}}                                                                mode value}
255                      Digit      {is {/ {t 0} {t 1} {t 2} {t 3} {t 4} {t 5} {t 6} {t 7} {t 8} {t 9}}                mode value}
256                      Expression {is {x {n Term} {* {x {n AddOp} {n Term}}}}                                        mode value}
257                      Factor     {is {/ {x {t (} {n Expression} {t )}} {n Number}}                                  mode value}
258                      MulOp      {is {/ {t *} {t /}}                                                                mode value}
259                      Number     {is {x {? {n Sign}} {+ {n Digit}}}                                                 mode value}
260                      Sign       {is {/ {t -} {t +}}                                                                mode value}
261                      Term       {is {x {n Factor} {* {x {n MulOp} {n Factor}}}}                                    mode value}
262                  }
263                  start {n Expression}
264              }
265
266
267       The similarity of the latter to the JSON should be quite obvious.
268

PEG SERIALIZATION FORMAT

270       Here we specify the format used by the Parser Tools to serialize  Pars‐
271       ing  Expression Grammars as immutable values for transport, comparison,
272       etc.
273
274       We distinguish between regular and canonical serializations.   While  a
275       PEG  may  have  more than one regular serialization only exactly one of
276       them will be canonical.
277
278       regular serialization
279
280              [1]    The serialization of any PEG is a nested Tcl dictionary.
281
282              [2]    This dictionary holds a single key, pt::grammar::peg, and
283                     its value. This value holds the contents of the grammar.
284
285              [3]    The  contents of the grammar are a Tcl dictionary holding
286                     the set of nonterminal symbols and the  starting  expres‐
287                     sion. The relevant keys and their values are
288
289                     rules  The  value  is a Tcl dictionary whose keys are the
290                            names of the  nonterminal  symbols  known  to  the
291                            grammar.
292
293                            [1]    Each  nonterminal  symbol  may  occur  only
294                                   once.
295
296                            [2]    The empty string is not a legal nonterminal
297                                   symbol.
298
299                            [3]    The  value for each symbol is a Tcl dictio‐
300                                   nary itself. The relevant  keys  and  their
301                                   values in this dictionary are
302
303                                   is     The  value  is  the serialization of
304                                          the  parsing  expression  describing
305                                          the symbols sentennial structure, as
306                                          specified in the section PE  serial‐
307                                          ization format.
308
309                                   mode   The value can be one of three values
310                                          specifying how a parser should  han‐
311                                          dle  the  semantic value produced by
312                                          the symbol.
313
314                                          value  The  semantic  value  of  the
315                                                 nonterminal   symbol   is  an
316                                                 abstract syntax tree consist‐
317                                                 ing of a single node node for
318                                                 the nonterminal itself, which
319                                                 has  the ASTs of the symbol's
320                                                 right hand side as its  chil‐
321                                                 dren.
322
323                                          leaf   The  semantic  value  of  the
324                                                 nonterminal  symbol   is   an
325                                                 abstract syntax tree consist‐
326                                                 ing of a single node node for
327                                                 the  nonterminal, without any
328                                                 children. Any ASTs  generated
329                                                 by  the  symbol's  right hand
330                                                 side are discarded.
331
332                                          void   The nonterminal has no seman‐
333                                                 tic value. Any ASTs generated
334                                                 by the  symbol's  right  hand
335                                                 side are discarded (as well).
336
337                     start  The  value is the serialization of the start pars‐
338                            ing expression of the grammar, as specified in the
339                            section PE serialization format.
340
341              [4]    The terminal symbols of the grammar are specified implic‐
342                     itly as the set of all terminal symbols used in the start
343                     expression and on the RHS of the grammar rules.
344
345       canonical serialization
346              The canonical serialization of a grammar has the format as spec‐
347              ified in the previous item, and then additionally satisfies  the
348              constraints  below,  which make it unique among all the possible
349              serializations of this grammar.
350
351              [1]    The keys found in all the  nested  Tcl  dictionaries  are
352                     sorted  in  ascending  dictionary  order, as generated by
353                     Tcl's builtin command lsort -increasing -dict.
354
355              [2]    The string representation of the value is  the  canonical
356                     representation of a Tcl dictionary. I.e. it does not con‐
357                     tain superfluous whitespace.
358
359   EXAMPLE
360       Assuming the following PEG for simple mathematical expressions
361
362              PEG calculator (Expression)
363                  Digit      <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9'       ;
364                  Sign       <- '-' / '+'                                     ;
365                  Number     <- Sign? Digit+                                  ;
366                  Expression <- Term (AddOp Term)*                            ;
367                  MulOp      <- '*' / '/'                                     ;
368                  Term       <- Factor (MulOp Factor)*                        ;
369                  AddOp      <- '+'/'-'                                       ;
370                  Factor     <- '(' Expression ')' / Number                   ;
371              END;
372
373
374       then its canonical serialization (except for whitespace) is
375
376              pt::grammar::peg {
377                  rules {
378                      AddOp      {is {/ {t -} {t +}}                                                                mode value}
379                      Digit      {is {/ {t 0} {t 1} {t 2} {t 3} {t 4} {t 5} {t 6} {t 7} {t 8} {t 9}}                mode value}
380                      Expression {is {x {n Term} {* {x {n AddOp} {n Term}}}}                                        mode value}
381                      Factor     {is {/ {x {t (} {n Expression} {t )}} {n Number}}                                  mode value}
382                      MulOp      {is {/ {t *} {t /}}                                                                mode value}
383                      Number     {is {x {? {n Sign}} {+ {n Digit}}}                                                 mode value}
384                      Sign       {is {/ {t -} {t +}}                                                                mode value}
385                      Term       {is {x {n Factor} {* {x {n MulOp} {n Factor}}}}                                    mode value}
386                  }
387                  start {n Expression}
388              }
389
390

PE SERIALIZATION FORMAT

392       Here we specify the format used by the Parser Tools to serialize  Pars‐
393       ing Expressions as immutable values for transport, comparison, etc.
394
395       We  distinguish  between regular and canonical serializations.  While a
396       parsing expression may have more than one  regular  serialization  only
397       exactly one of them will be canonical.
398
399       Regular serialization
400
401              Atomic Parsing Expressions
402
403                     [1]    The  string  epsilon  is an atomic parsing expres‐
404                            sion. It matches the empty string.
405
406                     [2]    The string dot is an atomic parsing expression. It
407                            matches any character.
408
409                     [3]    The  string alnum is an atomic parsing expression.
410                            It matches any Unicode alphabet or  digit  charac‐
411                            ter.  This  is  a custom extension of PEs based on
412                            Tcl's builtin command string is.
413
414                     [4]    The string alpha is an atomic parsing  expression.
415                            It matches any Unicode alphabet character. This is
416                            a custom extension of PEs based on  Tcl's  builtin
417                            command string is.
418
419                     [5]    The  string ascii is an atomic parsing expression.
420                            It matches any Unicode character below U0080. This
421                            is  a  custom  extension  of  PEs  based  on Tcl's
422                            builtin command string is.
423
424                     [6]    The string control is an  atomic  parsing  expres‐
425                            sion.  It  matches  any Unicode control character.
426                            This is a custom extension of PEs based  on  Tcl's
427                            builtin command string is.
428
429                     [7]    The  string digit is an atomic parsing expression.
430                            It matches any Unicode digit character. Note  that
431                            this  includes  characters  outside  of the [0..9]
432                            range. This is a custom extension of PEs based  on
433                            Tcl's builtin command string is.
434
435                     [8]    The  string graph is an atomic parsing expression.
436                            It matches any Unicode printing character,  except
437                            for space. This is a custom extension of PEs based
438                            on Tcl's builtin command string is.
439
440                     [9]    The string lower is an atomic parsing  expression.
441                            It matches any Unicode lower-case alphabet charac‐
442                            ter. This is a custom extension of  PEs  based  on
443                            Tcl's builtin command string is.
444
445                     [10]   The  string print is an atomic parsing expression.
446                            It matches any Unicode printing character, includ‐
447                            ing space. This is a custom extension of PEs based
448                            on Tcl's builtin command string is.
449
450                     [11]   The string punct is an atomic parsing  expression.
451                            It matches any Unicode punctuation character. This
452                            is a  custom  extension  of  PEs  based  on  Tcl's
453                            builtin command string is.
454
455                     [12]   The  string space is an atomic parsing expression.
456                            It matches any Unicode space character. This is  a
457                            custom  extension  of  PEs  based on Tcl's builtin
458                            command string is.
459
460                     [13]   The string upper is an atomic parsing  expression.
461                            It matches any Unicode upper-case alphabet charac‐
462                            ter. This is a custom extension of  PEs  based  on
463                            Tcl's builtin command string is.
464
465                     [14]   The  string  wordchar is an atomic parsing expres‐
466                            sion. It matches any Unicode word character.  This
467                            is any alphanumeric character (see alnum), and any
468                            connector  punctuation  characters  (e.g.   under‐
469                            score). This is a custom extension of PEs based on
470                            Tcl's builtin command string is.
471
472                     [15]   The string xdigit is an atomic parsing expression.
473                            It  matches  any hexadecimal digit character. This
474                            is a  custom  extension  of  PEs  based  on  Tcl's
475                            builtin command string is.
476
477                     [16]   The string ddigit is an atomic parsing expression.
478                            It matches any decimal digit character. This is  a
479                            custom  extension  of  PEs  based on Tcl's builtin
480                            command regexp.
481
482                     [17]   The expression [list t x]  is  an  atomic  parsing
483                            expression. It matches the terminal string x.
484
485                     [18]   The  expression  [list  n  A] is an atomic parsing
486                            expression. It matches the nonterminal A.
487
488              Combined Parsing Expressions
489
490                     [1]    For parsing expressions e1, e2, ... the result  of
491                            [list  /  e1  e2  ... ] is a parsing expression as
492                            well.  This is the ordered choice, aka prioritized
493                            choice.
494
495                     [2]    For  parsing expressions e1, e2, ... the result of
496                            [list x e1 e2 ... ] is  a  parsing  expression  as
497                            well.  This is the sequence.
498
499                     [3]    For  a  parsing expression e the result of [list *
500                            e] is a parsing expression as well.  This  is  the
501                            kleene  closure,  describing  zero or more repeti‐
502                            tions.
503
504                     [4]    For a parsing expression e the result of  [list  +
505                            e]  is  a parsing expression as well.  This is the
506                            positive kleene closure, describing  one  or  more
507                            repetitions.
508
509                     [5]    For  a  parsing expression e the result of [list &
510                            e] is a parsing expression as well.  This  is  the
511                            and lookahead predicate.
512
513                     [6]    For  a  parsing expression e the result of [list !
514                            e] is a parsing expression as well.  This  is  the
515                            not lookahead predicate.
516
517                     [7]    For  a  parsing expression e the result of [list ?
518                            e] is a parsing expression as well.  This  is  the
519                            optional input.
520
521       Canonical serialization
522              The canonical serialization of a parsing expression has the for‐
523              mat as specified in the previous  item,  and  then  additionally
524              satisfies  the constraints below, which make it unique among all
525              the possible serializations of this parsing expression.
526
527              [1]    The string representation of the value is  the  canonical
528                     representation  of a pure Tcl list. I.e. it does not con‐
529                     tain superfluous whitespace.
530
531              [2]    Terminals are not encoded as ranges (where start and  end
532                     of the range are identical).
533
534   EXAMPLE
535       Assuming  the  parsing  expression  shown on the right-hand side of the
536       rule
537
538                  Expression <- Term (AddOp Term)*
539
540
541       then its canonical serialization (except for whitespace) is
542
543                  {x {n Term} {* {x {n AddOp} {n Term}}}}
544
545

BUGS, IDEAS, FEEDBACK

547       This document, and the package it describes, will  undoubtedly  contain
548       bugs  and other problems.  Please report such in the category pt of the
549       Tcllib Trackers  [http://core.tcl.tk/tcllib/reportlist].   Please  also
550       report  any  ideas  for  enhancements  you  may have for either package
551       and/or documentation.
552
553       When proposing code changes, please provide unified diffs, i.e the out‐
554       put of diff -u.
555
556       Note  further  that  attachments  are  strongly  preferred over inlined
557       patches. Attachments can be made by going  to  the  Edit  form  of  the
558       ticket  immediately  after  its  creation, and then using the left-most
559       button in the secondary navigation bar.
560

KEYWORDS

562       EBNF, JSON,  LL(k),  PEG,  TDPL,  context-free  languages,  conversion,
563       expression,  format  conversion,  grammar,  matching,  parser,  parsing
564       expression, parsing expression grammar, push down automaton,  recursive
565       descent, serialization, state, top-down parsing languages, transducer
566

CATEGORY

568       Parsing and Grammars
569
571       Copyright (c) 2009 Andreas Kupries <andreas_kupries@users.sourceforge.net>
572
573
574
575
576tcllib                                 1                  pt::peg::to::json(n)
Impressum