1pt::peg::export::json(n)         Parser Tools         pt::peg::export::json(n)
2
3
4
5______________________________________________________________________________
6

NAME

8       pt::peg::export::json - PEG Export Plugin. Write JSON format
9

SYNOPSIS

11       package require Tcl  8.5
12
13       package require pt::peg::export::json  ?1?
14
15       package require pt::peg::to::json
16
17       export serial configuration
18
19______________________________________________________________________________
20

DESCRIPTION

22       Are  you  lost ?  Do you have trouble understanding this document ?  In
23       that case please read the overview  provided  by  the  Introduction  to
24       Parser  Tools.  This document is the entrypoint to the whole system the
25       current package is a part of.
26
27       This package implements the parsing expression  grammar  export  plugin
28       for the generation of JSON markup.
29
30       It  resides in the Export section of the Core Layer of Parser Tools and
31       is intended to be used by pt::peg::export, the export manager,  sitting
32       between it and the corresponding core conversion functionality provided
33       by pt::peg::to::json.
34
35       IMAGE: arch_core_eplugins
36
37       While the direct use of this package with a regular interpreter is pos‐
38       sible, this is strongly disrecommended and requires a number of contor‐
39       tions to provide the expected environment.  The proper way to use  this
40       functionality depends on the situation:
41
42       [1]    In  an  untrusted  environment  the proper access is through the
43              package pt::peg::export and the export manager objects  it  pro‐
44              vides.
45
46       [2]    In   a  trusted  environment  however  simply  use  the  package
47              pt::peg::to::json and access the core  conversion  functionality
48              directly.
49

API

51       The  API  provided  by  this package satisfies the specification of the
52       Plugin API found in the Parser Tools Export API specification.
53
54       export serial configuration
55              This command takes the  canonical  serialization  of  a  parsing
56              expression  grammar,  as  specified in section PEG serialization
57              format, and contained in serial, the  configuration,  a  dictio‐
58              nary,  and generates JSON markup encoding the grammar.  The cre‐
59              ated string is then returned as the result of the command.
60

CONFIGURATION

62       The JSON export plugin recognizes the following configuration variables
63       and changes its behaviour as they specify.
64
65       boolean indented
66              If  this  flag  is  set the plugin will break the generated JSON
67              code across lines and indent it according to  its  inner  struc‐
68              ture, with each key of a dictionary on a separate line.
69
70              If  this  flag  is  not set (the default), the whole JSON object
71              will be written on a single line, with minimum  spacing  between
72              all elements.
73
74       boolean aligned
75              If  this  flag  is set the generator ensures that the values for
76              the keys in a dictionary are vertically aligned with each other,
77              for  a  nice  table  effect. To make this work this also implies
78              that indented is set.
79
80              If this flag is not set (the default), the output  is  formatted
81              as per the value of indented, without trying to align the values
82              for dictionary keys.
83
84       Note that this plugin  ignores  the  standard  configuration  variables
85       user, format, file, and name, and their values.
86

JSON GRAMMAR EXCHANGE FORMAT

88       The  json  format for parsing expression grammars was written as a data
89       exchange format not bound to Tcl. It was defined to allow the  exchange
90       of  grammars  with  PackRat/PEG  based parser generators for other lan‐
91       guages.
92
93       It is formally specified by the rules below:
94
95       [1]    The JSON of any PEG is a JSON object.
96
97       [2]    This object holds a single key, pt::grammar::peg, and its value.
98              This value holds the contents of the grammar.
99
100       [3]    The contents of the grammar are a JSON object holding the set of
101              nonterminal symbols and the starting  expression.  The  relevant
102              keys and their values are
103
104              rules  The  value  is  a JSON object whose keys are the names of
105                     the nonterminal symbols known to the grammar.
106
107                     [1]    Each nonterminal symbol may occur only once.
108
109                     [2]    The empty string is not a legal  nonterminal  sym‐
110                            bol.
111
112                     [3]    The value for each symbol is a JSON object itself.
113                            The relevant keys and their values in this dictio‐
114                            nary are
115
116                            is     The  value is a JSON string holding the Tcl
117                                   serialization  of  the  parsing  expression
118                                   describing  the  symbols  sentennial struc‐
119                                   ture, as specified in the section PE  seri‐
120                                   alization format.
121
122                            mode   The  value is a JSON holding holding one of
123                                   three values specifying how a parser should
124                                   handle  the  semantic value produced by the
125                                   symbol.
126
127                                   value  The semantic value of the  nontermi‐
128                                          nal  symbol  is  an  abstract syntax
129                                          tree consisting  of  a  single  node
130                                          node  for  the  nonterminal  itself,
131                                          which has the ASTs of  the  symbol's
132                                          right hand side as its children.
133
134                                   leaf   The  semantic value of the nontermi‐
135                                          nal symbol  is  an  abstract  syntax
136                                          tree  consisting  of  a  single node
137                                          node for  the  nonterminal,  without
138                                          any  children. Any ASTs generated by
139                                          the symbol's  right  hand  side  are
140                                          discarded.
141
142                                   void   The   nonterminal  has  no  semantic
143                                          value. Any  ASTs  generated  by  the
144                                          symbol's  right  hand  side are dis‐
145                                          carded (as well).
146
147              start  The value is a JSON string holding the Tcl  serialization
148                     of the start parsing expression of the grammar, as speci‐
149                     fied in the section PE serialization format.
150
151       [4]    The terminal symbols of the grammar are specified implicitly  as
152              the set of all terminal symbols used in the start expression and
153              on the RHS of the grammar rules.
154
155       As an aside to the advanced reader, this is pretty much the same as the
156       Tcl  serialization  of PE grammars, as specified in section PEG serial‐
157       ization format, except that the Tcl dictionaries and lists of that for‐
158       mat are mapped to JSON objects and arrays. Only the parsing expressions
159       themselves are not translated further, but kept as  JSON  strings  con‐
160       taining  a  nested  Tcl list, and there is no concept of canonicity for
161       the JSON either.
162
163   EXAMPLE
164       Assuming the following PEG for simple mathematical expressions
165
166              PEG calculator (Expression)
167                  Digit      <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9'       ;
168                  Sign       <- '-' / '+'                                     ;
169                  Number     <- Sign? Digit+                                  ;
170                  Expression <- Term (AddOp Term)*                            ;
171                  MulOp      <- '*' / '/'                                     ;
172                  Term       <- Factor (MulOp Factor)*                        ;
173                  AddOp      <- '+'/'-'                                       ;
174                  Factor     <- '(' Expression ')' / Number                   ;
175              END;
176
177
178       a JSON serialization for it is
179
180              {
181                  "pt::grammar::peg" : {
182                      "rules" : {
183                          "AddOp"     : {
184                              "is"   : "\/ {t -} {t +}",
185                              "mode" : "value"
186                          },
187                          "Digit"     : {
188                              "is"   : "\/ {t 0} {t 1} {t 2} {t 3} {t 4} {t 5} {t 6} {t 7} {t 8} {t 9}",
189                              "mode" : "value"
190                          },
191                          "Expression" : {
192                              "is"   : "\/ {x {t (} {n Expression} {t )}} {x {n Factor} {* {x {n MulOp} {n Factor}}}}",
193                              "mode" : "value"
194                          },
195                          "Factor"    : {
196                              "is"   : "x {n Term} {* {x {n AddOp} {n Term}}}",
197                              "mode" : "value"
198                          },
199                          "MulOp"     : {
200                              "is"   : "\/ {t *} {t \/}",
201                              "mode" : "value"
202                          },
203                          "Number"    : {
204                              "is"   : "x {? {n Sign}} {+ {n Digit}}",
205                              "mode" : "value"
206                          },
207                          "Sign"      : {
208                              "is"   : "\/ {t -} {t +}",
209                              "mode" : "value"
210                          },
211                          "Term"      : {
212                              "is"   : "n Number",
213                              "mode" : "value"
214                          }
215                      },
216                      "start" : "n Expression"
217                  }
218              }
219
220
221       and a Tcl serialization of the same is
222
223              pt::grammar::peg {
224                  rules {
225                      AddOp      {is {/ {t -} {t +}}                                                                mode value}
226                      Digit      {is {/ {t 0} {t 1} {t 2} {t 3} {t 4} {t 5} {t 6} {t 7} {t 8} {t 9}}                mode value}
227                      Expression {is {x {n Term} {* {x {n AddOp} {n Term}}}}                                        mode value}
228                      Factor     {is {/ {x {t (} {n Expression} {t )}} {n Number}}                                  mode value}
229                      MulOp      {is {/ {t *} {t /}}                                                                mode value}
230                      Number     {is {x {? {n Sign}} {+ {n Digit}}}                                                 mode value}
231                      Sign       {is {/ {t -} {t +}}                                                                mode value}
232                      Term       {is {x {n Factor} {* {x {n MulOp} {n Factor}}}}                                    mode value}
233                  }
234                  start {n Expression}
235              }
236
237
238       The similarity of the latter to the JSON should be quite obvious.
239

PEG SERIALIZATION FORMAT

241       Here we specify the format used by the Parser Tools to serialize  Pars‐
242       ing  Expression Grammars as immutable values for transport, comparison,
243       etc.
244
245       We distinguish between regular and canonical serializations.   While  a
246       PEG  may  have  more than one regular serialization only exactly one of
247       them will be canonical.
248
249       regular serialization
250
251              [1]    The serialization of any PEG is a nested Tcl dictionary.
252
253              [2]    This dictionary holds a single key, pt::grammar::peg, and
254                     its value. This value holds the contents of the grammar.
255
256              [3]    The  contents of the grammar are a Tcl dictionary holding
257                     the set of nonterminal symbols and the  starting  expres‐
258                     sion. The relevant keys and their values are
259
260                     rules  The  value  is a Tcl dictionary whose keys are the
261                            names of the  nonterminal  symbols  known  to  the
262                            grammar.
263
264                            [1]    Each  nonterminal  symbol  may  occur  only
265                                   once.
266
267                            [2]    The empty string is not a legal nonterminal
268                                   symbol.
269
270                            [3]    The  value for each symbol is a Tcl dictio‐
271                                   nary itself. The relevant  keys  and  their
272                                   values in this dictionary are
273
274                                   is     The  value  is  the serialization of
275                                          the  parsing  expression  describing
276                                          the symbols sentennial structure, as
277                                          specified in the section PE  serial‐
278                                          ization format.
279
280                                   mode   The value can be one of three values
281                                          specifying how a parser should  han‐
282                                          dle  the  semantic value produced by
283                                          the symbol.
284
285                                          value  The  semantic  value  of  the
286                                                 nonterminal   symbol   is  an
287                                                 abstract syntax tree consist‐
288                                                 ing of a single node node for
289                                                 the nonterminal itself, which
290                                                 has  the ASTs of the symbol's
291                                                 right hand side as its  chil‐
292                                                 dren.
293
294                                          leaf   The  semantic  value  of  the
295                                                 nonterminal  symbol   is   an
296                                                 abstract syntax tree consist‐
297                                                 ing of a single node node for
298                                                 the  nonterminal, without any
299                                                 children. Any ASTs  generated
300                                                 by  the  symbol's  right hand
301                                                 side are discarded.
302
303                                          void   The nonterminal has no seman‐
304                                                 tic value. Any ASTs generated
305                                                 by the  symbol's  right  hand
306                                                 side are discarded (as well).
307
308                     start  The  value is the serialization of the start pars‐
309                            ing expression of the grammar, as specified in the
310                            section PE serialization format.
311
312              [4]    The terminal symbols of the grammar are specified implic‐
313                     itly as the set of all terminal symbols used in the start
314                     expression and on the RHS of the grammar rules.
315
316       canonical serialization
317              The canonical serialization of a grammar has the format as spec‐
318              ified in the previous item, and then additionally satisfies  the
319              constraints  below,  which make it unique among all the possible
320              serializations of this grammar.
321
322              [1]    The keys found in all the  nested  Tcl  dictionaries  are
323                     sorted  in  ascending  dictionary  order, as generated by
324                     Tcl's builtin command lsort -increasing -dict.
325
326              [2]    The string representation of the value is  the  canonical
327                     representation of a Tcl dictionary. I.e. it does not con‐
328                     tain superfluous whitespace.
329
330   EXAMPLE
331       Assuming the following PEG for simple mathematical expressions
332
333              PEG calculator (Expression)
334                  Digit      <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9'       ;
335                  Sign       <- '-' / '+'                                     ;
336                  Number     <- Sign? Digit+                                  ;
337                  Expression <- Term (AddOp Term)*                            ;
338                  MulOp      <- '*' / '/'                                     ;
339                  Term       <- Factor (MulOp Factor)*                        ;
340                  AddOp      <- '+'/'-'                                       ;
341                  Factor     <- '(' Expression ')' / Number                   ;
342              END;
343
344
345       then its canonical serialization (except for whitespace) is
346
347              pt::grammar::peg {
348                  rules {
349                      AddOp      {is {/ {t -} {t +}}                                                                mode value}
350                      Digit      {is {/ {t 0} {t 1} {t 2} {t 3} {t 4} {t 5} {t 6} {t 7} {t 8} {t 9}}                mode value}
351                      Expression {is {x {n Term} {* {x {n AddOp} {n Term}}}}                                        mode value}
352                      Factor     {is {/ {x {t (} {n Expression} {t )}} {n Number}}                                  mode value}
353                      MulOp      {is {/ {t *} {t /}}                                                                mode value}
354                      Number     {is {x {? {n Sign}} {+ {n Digit}}}                                                 mode value}
355                      Sign       {is {/ {t -} {t +}}                                                                mode value}
356                      Term       {is {x {n Factor} {* {x {n MulOp} {n Factor}}}}                                    mode value}
357                  }
358                  start {n Expression}
359              }
360
361

PE SERIALIZATION FORMAT

363       Here we specify the format used by the Parser Tools to serialize  Pars‐
364       ing Expressions as immutable values for transport, comparison, etc.
365
366       We  distinguish  between regular and canonical serializations.  While a
367       parsing expression may have more than one  regular  serialization  only
368       exactly one of them will be canonical.
369
370       Regular serialization
371
372              Atomic Parsing Expressions
373
374                     [1]    The  string  epsilon  is an atomic parsing expres‐
375                            sion. It matches the empty string.
376
377                     [2]    The string dot is an atomic parsing expression. It
378                            matches any character.
379
380                     [3]    The  string alnum is an atomic parsing expression.
381                            It matches any Unicode alphabet or  digit  charac‐
382                            ter.  This  is  a custom extension of PEs based on
383                            Tcl's builtin command string is.
384
385                     [4]    The string alpha is an atomic parsing  expression.
386                            It matches any Unicode alphabet character. This is
387                            a custom extension of PEs based on  Tcl's  builtin
388                            command string is.
389
390                     [5]    The  string ascii is an atomic parsing expression.
391                            It matches any Unicode character below U0080. This
392                            is  a  custom  extension  of  PEs  based  on Tcl's
393                            builtin command string is.
394
395                     [6]    The string control is an  atomic  parsing  expres‐
396                            sion.  It  matches  any Unicode control character.
397                            This is a custom extension of PEs based  on  Tcl's
398                            builtin command string is.
399
400                     [7]    The  string digit is an atomic parsing expression.
401                            It matches any Unicode digit character. Note  that
402                            this  includes  characters  outside  of the [0..9]
403                            range. This is a custom extension of PEs based  on
404                            Tcl's builtin command string is.
405
406                     [8]    The  string graph is an atomic parsing expression.
407                            It matches any Unicode printing character,  except
408                            for space. This is a custom extension of PEs based
409                            on Tcl's builtin command string is.
410
411                     [9]    The string lower is an atomic parsing  expression.
412                            It matches any Unicode lower-case alphabet charac‐
413                            ter. This is a custom extension of  PEs  based  on
414                            Tcl's builtin command string is.
415
416                     [10]   The  string print is an atomic parsing expression.
417                            It matches any Unicode printing character, includ‐
418                            ing space. This is a custom extension of PEs based
419                            on Tcl's builtin command string is.
420
421                     [11]   The string punct is an atomic parsing  expression.
422                            It matches any Unicode punctuation character. This
423                            is a  custom  extension  of  PEs  based  on  Tcl's
424                            builtin command string is.
425
426                     [12]   The  string space is an atomic parsing expression.
427                            It matches any Unicode space character. This is  a
428                            custom  extension  of  PEs  based on Tcl's builtin
429                            command string is.
430
431                     [13]   The string upper is an atomic parsing  expression.
432                            It matches any Unicode upper-case alphabet charac‐
433                            ter. This is a custom extension of  PEs  based  on
434                            Tcl's builtin command string is.
435
436                     [14]   The  string  wordchar is an atomic parsing expres‐
437                            sion. It matches any Unicode word character.  This
438                            is any alphanumeric character (see alnum), and any
439                            connector  punctuation  characters  (e.g.   under‐
440                            score). This is a custom extension of PEs based on
441                            Tcl's builtin command string is.
442
443                     [15]   The string xdigit is an atomic parsing expression.
444                            It  matches  any hexadecimal digit character. This
445                            is a  custom  extension  of  PEs  based  on  Tcl's
446                            builtin command string is.
447
448                     [16]   The string ddigit is an atomic parsing expression.
449                            It matches any decimal digit character. This is  a
450                            custom  extension  of  PEs  based on Tcl's builtin
451                            command regexp.
452
453                     [17]   The expression [list t x]  is  an  atomic  parsing
454                            expression. It matches the terminal string x.
455
456                     [18]   The  expression  [list  n  A] is an atomic parsing
457                            expression. It matches the nonterminal A.
458
459              Combined Parsing Expressions
460
461                     [1]    For parsing expressions e1, e2, ... the result  of
462                            [list  /  e1  e2  ... ] is a parsing expression as
463                            well.  This is the ordered choice, aka prioritized
464                            choice.
465
466                     [2]    For  parsing expressions e1, e2, ... the result of
467                            [list x e1 e2 ... ] is  a  parsing  expression  as
468                            well.  This is the sequence.
469
470                     [3]    For  a  parsing expression e the result of [list *
471                            e] is a parsing expression as well.  This  is  the
472                            kleene  closure,  describing  zero or more repeti‐
473                            tions.
474
475                     [4]    For a parsing expression e the result of  [list  +
476                            e]  is  a parsing expression as well.  This is the
477                            positive kleene closure, describing  one  or  more
478                            repetitions.
479
480                     [5]    For  a  parsing expression e the result of [list &
481                            e] is a parsing expression as well.  This  is  the
482                            and lookahead predicate.
483
484                     [6]    For  a  parsing expression e the result of [list !
485                            e] is a parsing expression as well.  This  is  the
486                            not lookahead predicate.
487
488                     [7]    For  a  parsing expression e the result of [list ?
489                            e] is a parsing expression as well.  This  is  the
490                            optional input.
491
492       Canonical serialization
493              The canonical serialization of a parsing expression has the for‐
494              mat as specified in the previous  item,  and  then  additionally
495              satisfies  the constraints below, which make it unique among all
496              the possible serializations of this parsing expression.
497
498              [1]    The string representation of the value is  the  canonical
499                     representation  of a pure Tcl list. I.e. it does not con‐
500                     tain superfluous whitespace.
501
502              [2]    Terminals are not encoded as ranges (where start and  end
503                     of the range are identical).
504
505   EXAMPLE
506       Assuming  the  parsing  expression  shown on the right-hand side of the
507       rule
508
509                  Expression <- Term (AddOp Term)*
510
511
512       then its canonical serialization (except for whitespace) is
513
514                  {x {n Term} {* {x {n AddOp} {n Term}}}}
515
516

BUGS, IDEAS, FEEDBACK

518       This document, and the package it describes, will  undoubtedly  contain
519       bugs  and other problems.  Please report such in the category pt of the
520       Tcllib Trackers  [http://core.tcl.tk/tcllib/reportlist].   Please  also
521       report  any  ideas  for  enhancements  you  may have for either package
522       and/or documentation.
523
524       When proposing code changes, please provide unified diffs, i.e the out‐
525       put of diff -u.
526
527       Note  further  that  attachments  are  strongly  preferred over inlined
528       patches. Attachments can be made by going  to  the  Edit  form  of  the
529       ticket  immediately  after  its  creation, and then using the left-most
530       button in the secondary navigation bar.
531

KEYWORDS

533       EBNF, JSON, LL(k), PEG, TDPL, context-free languages,  export,  expres‐
534       sion, grammar, matching, parser, parsing expression, parsing expression
535       grammar, plugin, push down automaton, recursive descent, serialization,
536       state, top-down parsing languages, transducer
537

CATEGORY

539       Parsing and Grammars
540
542       Copyright (c) 2009 Andreas Kupries <andreas_kupries@users.sourceforge.net>
543
544
545
546
547tcllib                                 1              pt::peg::export::json(n)
Impressum