1pt::peg::export::container(n)    Parser Tools    pt::peg::export::container(n)
2
3
4
5______________________________________________________________________________
6

NAME

8       pt::peg::export::container - PEG Export Plugin. Write CONTAINER format
9

SYNOPSIS

11       package require Tcl  8.5
12
13       package require pt::peg::export::container  ?1?
14
15       package require pt::peg::to::container
16
17       export serial configuration
18
19______________________________________________________________________________
20

DESCRIPTION

22       Are  you  lost ?  Do you have trouble understanding this document ?  In
23       that case please read the overview  provided  by  the  Introduction  to
24       Parser  Tools.  This document is the entrypoint to the whole system the
25       current package is a part of.
26
27       This package implements the parsing expression  grammar  export  plugin
28       for the generation of CONTAINER markup.
29
30       It  resides in the Export section of the Core Layer of Parser Tools and
31       is intended to be used by pt::peg::export, the export manager,  sitting
32       between it and the corresponding core conversion functionality provided
33       by pt::peg::to::container.
34
35       IMAGE: arch_core_eplugins
36
37       While the direct use of this package with a regular interpreter is pos‐
38       sible, this is strongly disrecommended and requires a number of contor‐
39       tions to provide the expected environment.  The proper way to use  this
40       functionality depends on the situation:
41
42       [1]    In  an  untrusted  environment  the proper access is through the
43              package pt::peg::export and the export manager objects  it  pro‐
44              vides.
45
46       [2]    In   a  trusted  environment  however  simply  use  the  package
47              pt::peg::to::container and access the core conversion  function‐
48              ality directly.
49

API

51       The  API  provided  by  this package satisfies the specification of the
52       Plugin API found in the Parser Tools Export API specification.
53
54       export serial configuration
55              This command takes the  canonical  serialization  of  a  parsing
56              expression  grammar,  as  specified in section PEG serialization
57              format, and contained in serial, the  configuration,  a  dictio‐
58              nary,  and generates CONTAINER markup encoding the grammar.  The
59              created string is then returned as the result of the command.
60

CONFIGURATION

62       The CONTAINER export  plugin  recognizes  the  following  configuration
63       variables and changes its behaviour as they specify.
64
65       enum mode
66              The  value of this configuration variable controls which methods
67              of pt::peg instances the plugin will use to specify the grammar.
68              There are two legal values
69
70              bulk   In this mode the methods start, add, modes, and rules are
71                     used to specify the grammar in a bulk manner, i.e.  as  a
72                     set  of nonterminal symbols, and two dictionaries mapping
73                     from the symbols to  their  semantic  modes  and  parsing
74                     expressions.
75
76                     This mode is the default.
77
78              incremental
79                     In  this  mode the methods start, add, mode, and rule are
80                     used to specify the grammar piecemal, with each nontermi‐
81                     nal having its own block of defining commands.
82
83       string template
84              If this configuration variable is set it is assumed to contain a
85              string into which to put the generated code and other configura‐
86              tion  data.  The  various locations are expected to be specified
87              with the following placeholders:
88
89              @user@ To be replaced with the value of the configuration  vari‐
90                     able user.
91
92              @format@
93                     To be replaced with the the constant CONTAINER.
94
95              @file@ To  be replaced with the value of the configuration vari‐
96                     able file.
97
98              @name@ To be replaced with the value of the configuration  vari‐
99                     able name.
100
101              @mode@ To  be replaced with the value of the configuration vari‐
102                     able mode.
103
104              @code@ To be replaced with the generated code.
105
106       If this configuration variable is not set, or empty,  then  the  plugin
107       falls back to a standard template, which is defined as "@code@".
108
109       Note  that  this plugin may ignore the standard configuration variables
110       user, format, file, and their values, depending on the chosen template.
111
112       The content of the standard configuration variable  name,  if  set,  is
113       used  as  name of the grammar in the output. Otherwise the plugin falls
114       back to the default name a_pe_grammar.
115

GRAMMAR CONTAINER

117       The container format is another form of describing  parsing  expression
118       grammars.  While  data in this format is executable it does not consti‐
119       tute a parser for the grammar. It always has to be used in  conjunction
120       with the package pt::peg::interp, a grammar interpreter.
121
122       The  format  represents  grammars  by  a  snit::type, i.e. class, whose
123       instances are API-compatible to the instances of the pt::peg::container
124       package, and which are preloaded with the grammar in question.
125
126       It has no direct formal specification beyond what was said above.
127
128   EXAMPLE
129       Assuming the following PEG for simple mathematical expressions
130
131              PEG calculator (Expression)
132                  Digit      <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9'       ;
133                  Sign       <- '-' / '+'                                     ;
134                  Number     <- Sign? Digit+                                  ;
135                  Expression <- Term (AddOp Term)*                            ;
136                  MulOp      <- '*' / '/'                                     ;
137                  Term       <- Factor (MulOp Factor)*                        ;
138                  AddOp      <- '+'/'-'                                       ;
139                  Factor     <- '(' Expression ')' / Number                   ;
140              END;
141
142
143       one possible CONTAINER serialization for it is
144
145              snit::type a_pe_grammar {
146                  constructor {} {
147                      install myg using pt::peg::container ${selfns}::G
148                      $myg start {n Expression}
149                      $myg add   AddOp Digit Expression Factor MulOp Number Sign Term
150                      $myg modes {
151                          AddOp      value
152                          Digit      value
153                          Expression value
154                          Factor     value
155                          MulOp      value
156                          Number     value
157                          Sign       value
158                          Term       value
159                      }
160                      $myg rules {
161                          AddOp      {/ {t -} {t +}}
162                          Digit      {/ {t 0} {t 1} {t 2} {t 3} {t 4} {t 5} {t 6} {t 7} {t 8} {t 9}}
163                          Expression {/ {x {t \50} {n Expression} {t \51}} {x {n Factor} {* {x {n MulOp} {n Factor}}}}}
164                          Factor     {x {n Term} {* {x {n AddOp} {n Term}}}}
165                          MulOp      {/ {t *} {t /}}
166                          Number     {x {? {n Sign}} {+ {n Digit}}}
167                          Sign       {/ {t -} {t +}}
168                          Term       {n Number}
169                      }
170                      return
171                  }
172
173                  component myg
174                  delegate method * to myg
175              }
176
177

PEG SERIALIZATION FORMAT

179       Here  we specify the format used by the Parser Tools to serialize Pars‐
180       ing Expression Grammars as immutable values for transport,  comparison,
181       etc.
182
183       We  distinguish  between regular and canonical serializations.  While a
184       PEG may have more than one regular serialization only  exactly  one  of
185       them will be canonical.
186
187       regular serialization
188
189              [1]    The serialization of any PEG is a nested Tcl dictionary.
190
191              [2]    This dictionary holds a single key, pt::grammar::peg, and
192                     its value. This value holds the contents of the grammar.
193
194              [3]    The contents of the grammar are a Tcl dictionary  holding
195                     the  set  of nonterminal symbols and the starting expres‐
196                     sion. The relevant keys and their values are
197
198                     rules  The value is a Tcl dictionary whose keys  are  the
199                            names  of  the  nonterminal  symbols  known to the
200                            grammar.
201
202                            [1]    Each  nonterminal  symbol  may  occur  only
203                                   once.
204
205                            [2]    The empty string is not a legal nonterminal
206                                   symbol.
207
208                            [3]    The value for each symbol is a Tcl  dictio‐
209                                   nary  itself.  The  relevant keys and their
210                                   values in this dictionary are
211
212                                   is     The value is  the  serialization  of
213                                          the  parsing  expression  describing
214                                          the symbols sentennial structure, as
215                                          specified  in the section PE serial‐
216                                          ization format.
217
218                                   mode   The value can be one of three values
219                                          specifying  how a parser should han‐
220                                          dle the semantic value  produced  by
221                                          the symbol.
222
223                                          value  The  semantic  value  of  the
224                                                 nonterminal  symbol   is   an
225                                                 abstract syntax tree consist‐
226                                                 ing of a single node node for
227                                                 the nonterminal itself, which
228                                                 has the ASTs of the  symbol's
229                                                 right  hand side as its chil‐
230                                                 dren.
231
232                                          leaf   The  semantic  value  of  the
233                                                 nonterminal   symbol   is  an
234                                                 abstract syntax tree consist‐
235                                                 ing of a single node node for
236                                                 the nonterminal, without  any
237                                                 children.  Any ASTs generated
238                                                 by the  symbol's  right  hand
239                                                 side are discarded.
240
241                                          void   The nonterminal has no seman‐
242                                                 tic value. Any ASTs generated
243                                                 by  the  symbol's  right hand
244                                                 side are discarded (as well).
245
246                     start  The value is the serialization of the start  pars‐
247                            ing expression of the grammar, as specified in the
248                            section PE serialization format.
249
250              [4]    The terminal symbols of the grammar are specified implic‐
251                     itly as the set of all terminal symbols used in the start
252                     expression and on the RHS of the grammar rules.
253
254       canonical serialization
255              The canonical serialization of a grammar has the format as spec‐
256              ified  in the previous item, and then additionally satisfies the
257              constraints below, which make it unique among all  the  possible
258              serializations of this grammar.
259
260              [1]    The  keys  found  in  all the nested Tcl dictionaries are
261                     sorted in ascending dictionary  order,  as  generated  by
262                     Tcl's builtin command lsort -increasing -dict.
263
264              [2]    The  string  representation of the value is the canonical
265                     representation of a Tcl dictionary. I.e. it does not con‐
266                     tain superfluous whitespace.
267
268   EXAMPLE
269       Assuming the following PEG for simple mathematical expressions
270
271              PEG calculator (Expression)
272                  Digit      <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9'       ;
273                  Sign       <- '-' / '+'                                     ;
274                  Number     <- Sign? Digit+                                  ;
275                  Expression <- Term (AddOp Term)*                            ;
276                  MulOp      <- '*' / '/'                                     ;
277                  Term       <- Factor (MulOp Factor)*                        ;
278                  AddOp      <- '+'/'-'                                       ;
279                  Factor     <- '(' Expression ')' / Number                   ;
280              END;
281
282
283       then its canonical serialization (except for whitespace) is
284
285              pt::grammar::peg {
286                  rules {
287                      AddOp      {is {/ {t -} {t +}}                                                                mode value}
288                      Digit      {is {/ {t 0} {t 1} {t 2} {t 3} {t 4} {t 5} {t 6} {t 7} {t 8} {t 9}}                mode value}
289                      Expression {is {x {n Term} {* {x {n AddOp} {n Term}}}}                                        mode value}
290                      Factor     {is {/ {x {t (} {n Expression} {t )}} {n Number}}                                  mode value}
291                      MulOp      {is {/ {t *} {t /}}                                                                mode value}
292                      Number     {is {x {? {n Sign}} {+ {n Digit}}}                                                 mode value}
293                      Sign       {is {/ {t -} {t +}}                                                                mode value}
294                      Term       {is {x {n Factor} {* {x {n MulOp} {n Factor}}}}                                    mode value}
295                  }
296                  start {n Expression}
297              }
298
299

PE SERIALIZATION FORMAT

301       Here  we specify the format used by the Parser Tools to serialize Pars‐
302       ing Expressions as immutable values for transport, comparison, etc.
303
304       We distinguish between regular and canonical serializations.   While  a
305       parsing  expression  may  have more than one regular serialization only
306       exactly one of them will be canonical.
307
308       Regular serialization
309
310              Atomic Parsing Expressions
311
312                     [1]    The string epsilon is an  atomic  parsing  expres‐
313                            sion. It matches the empty string.
314
315                     [2]    The string dot is an atomic parsing expression. It
316                            matches any character.
317
318                     [3]    The string alnum is an atomic parsing  expression.
319                            It  matches  any Unicode alphabet or digit charac‐
320                            ter. This is a custom extension of  PEs  based  on
321                            Tcl's builtin command string is.
322
323                     [4]    The  string alpha is an atomic parsing expression.
324                            It matches any Unicode alphabet character. This is
325                            a  custom  extension of PEs based on Tcl's builtin
326                            command string is.
327
328                     [5]    The string ascii is an atomic parsing  expression.
329                            It matches any Unicode character below U0080. This
330                            is a  custom  extension  of  PEs  based  on  Tcl's
331                            builtin command string is.
332
333                     [6]    The  string  control  is an atomic parsing expres‐
334                            sion. It matches any  Unicode  control  character.
335                            This  is  a custom extension of PEs based on Tcl's
336                            builtin command string is.
337
338                     [7]    The string digit is an atomic parsing  expression.
339                            It  matches any Unicode digit character. Note that
340                            this includes characters  outside  of  the  [0..9]
341                            range.  This is a custom extension of PEs based on
342                            Tcl's builtin command string is.
343
344                     [8]    The string graph is an atomic parsing  expression.
345                            It  matches any Unicode printing character, except
346                            for space. This is a custom extension of PEs based
347                            on Tcl's builtin command string is.
348
349                     [9]    The  string lower is an atomic parsing expression.
350                            It matches any Unicode lower-case alphabet charac‐
351                            ter.  This  is  a custom extension of PEs based on
352                            Tcl's builtin command string is.
353
354                     [10]   The string print is an atomic parsing  expression.
355                            It matches any Unicode printing character, includ‐
356                            ing space. This is a custom extension of PEs based
357                            on Tcl's builtin command string is.
358
359                     [11]   The  string punct is an atomic parsing expression.
360                            It matches any Unicode punctuation character. This
361                            is  a  custom  extension  of  PEs  based  on Tcl's
362                            builtin command string is.
363
364                     [12]   The string space is an atomic parsing  expression.
365                            It  matches any Unicode space character. This is a
366                            custom extension of PEs  based  on  Tcl's  builtin
367                            command string is.
368
369                     [13]   The  string upper is an atomic parsing expression.
370                            It matches any Unicode upper-case alphabet charac‐
371                            ter.  This  is  a custom extension of PEs based on
372                            Tcl's builtin command string is.
373
374                     [14]   The string wordchar is an atomic  parsing  expres‐
375                            sion.  It matches any Unicode word character. This
376                            is any alphanumeric character (see alnum), and any
377                            connector  punctuation  characters  (e.g.   under‐
378                            score). This is a custom extension of PEs based on
379                            Tcl's builtin command string is.
380
381                     [15]   The string xdigit is an atomic parsing expression.
382                            It matches any hexadecimal digit  character.  This
383                            is  a  custom  extension  of  PEs  based  on Tcl's
384                            builtin command string is.
385
386                     [16]   The string ddigit is an atomic parsing expression.
387                            It  matches any decimal digit character. This is a
388                            custom extension of PEs  based  on  Tcl's  builtin
389                            command regexp.
390
391                     [17]   The  expression  [list  t  x] is an atomic parsing
392                            expression. It matches the terminal string x.
393
394                     [18]   The expression [list n A]  is  an  atomic  parsing
395                            expression. It matches the nonterminal A.
396
397              Combined Parsing Expressions
398
399                     [1]    For  parsing expressions e1, e2, ... the result of
400                            [list / e1 e2 ... ] is  a  parsing  expression  as
401                            well.  This is the ordered choice, aka prioritized
402                            choice.
403
404                     [2]    For parsing expressions e1, e2, ... the result  of
405                            [list  x  e1  e2  ... ] is a parsing expression as
406                            well.  This is the sequence.
407
408                     [3]    For a parsing expression e the result of  [list  *
409                            e]  is  a parsing expression as well.  This is the
410                            kleene closure, describing zero  or  more  repeti‐
411                            tions.
412
413                     [4]    For  a  parsing expression e the result of [list +
414                            e] is a parsing expression as well.  This  is  the
415                            positive  kleene  closure,  describing one or more
416                            repetitions.
417
418                     [5]    For a parsing expression e the result of  [list  &
419                            e]  is  a parsing expression as well.  This is the
420                            and lookahead predicate.
421
422                     [6]    For a parsing expression e the result of  [list  !
423                            e]  is  a parsing expression as well.  This is the
424                            not lookahead predicate.
425
426                     [7]    For a parsing expression e the result of  [list  ?
427                            e]  is  a parsing expression as well.  This is the
428                            optional input.
429
430       Canonical serialization
431              The canonical serialization of a parsing expression has the for‐
432              mat  as  specified  in  the previous item, and then additionally
433              satisfies the constraints below, which make it unique among  all
434              the possible serializations of this parsing expression.
435
436              [1]    The  string  representation of the value is the canonical
437                     representation of a pure Tcl list. I.e. it does not  con‐
438                     tain superfluous whitespace.
439
440              [2]    Terminals  are not encoded as ranges (where start and end
441                     of the range are identical).
442
443   EXAMPLE
444       Assuming the parsing expression shown on the  right-hand  side  of  the
445       rule
446
447                  Expression <- Term (AddOp Term)*
448
449
450       then its canonical serialization (except for whitespace) is
451
452                  {x {n Term} {* {x {n AddOp} {n Term}}}}
453
454

BUGS, IDEAS, FEEDBACK

456       This  document,  and the package it describes, will undoubtedly contain
457       bugs and other problems.  Please report such in the category pt of  the
458       Tcllib  Trackers  [http://core.tcl.tk/tcllib/reportlist].   Please also
459       report any ideas for enhancements  you  may  have  for  either  package
460       and/or documentation.
461
462       When proposing code changes, please provide unified diffs, i.e the out‐
463       put of diff -u.
464
465       Note further that  attachments  are  strongly  preferred  over  inlined
466       patches.  Attachments  can  be  made  by  going to the Edit form of the
467       ticket immediately after its creation, and  then  using  the  left-most
468       button in the secondary navigation bar.
469

KEYWORDS

471       CONTAINER,  EBNF,  LL(k),  PEG,  TDPL,  context-free languages, export,
472       expression, grammar,  matching,  parser,  parsing  expression,  parsing
473       expression  grammar,  plugin,  push  down automaton, recursive descent,
474       serialization, state, top-down parsing languages, transducer
475

CATEGORY

477       Parsing and Grammars
478
480       Copyright (c) 2009 Andreas Kupries <andreas_kupries@users.sourceforge.net>
481
482
483
484
485tcllib                                 1         pt::peg::export::container(n)
Impressum