1pt::peg::to::container(n)        Parser Tools        pt::peg::to::container(n)
2
3
4
5______________________________________________________________________________
6

NAME

8       pt::peg::to::container - PEG Conversion. Write CONTAINER format
9

SYNOPSIS

11       package require Tcl  8.5
12
13       package require pt::peg::to::container  ?1?
14
15       package require pt::peg
16
17       package require text::write
18
19       package require char
20
21       pt::peg::to::container reset
22
23       pt::peg::to::container configure
24
25       pt::peg::to::container configure option
26
27       pt::peg::to::container configure option value...
28
29       pt::peg::to::container convert serial
30
31______________________________________________________________________________
32

DESCRIPTION

34       Are  you  lost ?  Do you have trouble understanding this document ?  In
35       that case please read the overview  provided  by  the  Introduction  to
36       Parser  Tools.  This document is the entrypoint to the whole system the
37       current package is a part of.
38
39       This package implements the converter from parsing expression  grammars
40       to CONTAINER markup.
41
42       It resides in the Export section of the Core Layer of Parser Tools, and
43       can be used either directly with the other packages of this  layer,  or
44       indirectly  through the export manager provided by pt::peg::export. The
45       latter is intented for use in untrusted environments and  done  through
46       the  corresponding  export  plugin  pt::peg::export::container  sitting
47       between converter and export manager.
48
49       IMAGE: arch_core_eplugins
50

API

52       The API provided by this package satisfies  the  specification  of  the
53       Converter API found in the Parser Tools Export API specification.
54
55       pt::peg::to::container reset
56              This  command  resets  the  configuration  of the package to its
57              default settings.
58
59       pt::peg::to::container configure
60              This command returns a dictionary containing the current config‐
61              uration of the package.
62
63       pt::peg::to::container configure option
64              This command returns the current value of the specified configu‐
65              ration option of the package. For  the  set  of  legal  options,
66              please read the section Options.
67
68       pt::peg::to::container configure option value...
69              This  command  sets the given configuration options of the pack‐
70              age, to the specified values. For  the  set  of  legal  options,
71              please read the section Options.
72
73       pt::peg::to::container convert serial
74              This  command  takes  the  canonical  serialization of a parsing
75              expression grammar, as specified in  section  PEG  serialization
76              format,  and contained in serial, and generates CONTAINER markup
77              encoding the grammar, per  the  current  package  configuration.
78              The  created  string  is then returned as the result of the com‐
79              mand.
80

OPTIONS

82       The converter to the CONTAINER format recognizes the following  options
83       and changes its behaviour as they specify.
84
85       -file string
86              The value of this option is the name of the file or other entity
87              from which the grammar came, for which the command is  run.  The
88              default value is unknown.
89
90       -name string
91              The  value of this option is the name of the grammar we are pro‐
92              cessing.  The default value is a_pe_grammar.
93
94       -user string
95              The value of this option is the name of the user for  which  the
96              command is run. The default value is unknown.
97
98       -mode bulk|incremental
99              The value of this option controls which methods of pt::peg::con‐
100              tainer instances are used to specify the grammar,  i.e.  preload
101              it  into  the  container.  There are two legal values, as listed
102              below. The default is bulk.
103
104              bulk   In this mode the methods start, add, modes, and rules are
105                     used  to  specify the grammar in a bulk manner, i.e. as a
106                     set of nonterminal symbols, and two dictionaries  mapping
107                     from  the  symbols  to  their  semantic modes and parsing
108                     expressions.
109
110                     This mode is the default.
111
112              incremental
113                     In this mode the methods start, add, mode, and  rule  are
114                     used to specify the grammar piecemal, with each nontermi‐
115                     nal having its own block of defining commands.
116
117       -template string
118              The value of this option is a string into which to put the  gen‐
119              erated  code  and  the other configuration settings. The various
120              locations for user-data are expected to be  specified  with  the
121              placeholders listed below. The default value is "@code@".
122
123              @user@ To be replaced with the value of the option -user.
124
125              @format@
126                     To be replaced with the the constant CONTAINER.
127
128              @file@ To be replaced with the value of the option -file.
129
130              @name@ To be replaced with the value of the option -name.
131
132              @mode@ To be replaced with the value of the option -mode.
133
134              @code@ To be replaced with the generated code.
135

GRAMMAR CONTAINER

137       The  container  format is another form of describing parsing expression
138       grammars. While data in this format is executable it does  not  consti‐
139       tute  a parser for the grammar. It always has to be used in conjunction
140       with the package pt::peg::interp, a grammar interpreter.
141
142       The format represents grammars  by  a  snit::type,  i.e.  class,  whose
143       instances are API-compatible to the instances of the pt::peg::container
144       package, and which are preloaded with the grammar in question.
145
146       It has no direct formal specification beyond what was said above.
147
148   EXAMPLE
149       Assuming the following PEG for simple mathematical expressions
150
151              PEG calculator (Expression)
152                  Digit      <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9'       ;
153                  Sign       <- '-' / '+'                                     ;
154                  Number     <- Sign? Digit+                                  ;
155                  Expression <- Term (AddOp Term)*                            ;
156                  MulOp      <- '*' / '/'                                     ;
157                  Term       <- Factor (MulOp Factor)*                        ;
158                  AddOp      <- '+'/'-'                                       ;
159                  Factor     <- '(' Expression ')' / Number                   ;
160              END;
161
162
163       one possible CONTAINER serialization for it is
164
165              snit::type a_pe_grammar {
166                  constructor {} {
167                      install myg using pt::peg::container ${selfns}::G
168                      $myg start {n Expression}
169                      $myg add   AddOp Digit Expression Factor MulOp Number Sign Term
170                      $myg modes {
171                          AddOp      value
172                          Digit      value
173                          Expression value
174                          Factor     value
175                          MulOp      value
176                          Number     value
177                          Sign       value
178                          Term       value
179                      }
180                      $myg rules {
181                          AddOp      {/ {t -} {t +}}
182                          Digit      {/ {t 0} {t 1} {t 2} {t 3} {t 4} {t 5} {t 6} {t 7} {t 8} {t 9}}
183                          Expression {/ {x {t \50} {n Expression} {t \51}} {x {n Factor} {* {x {n MulOp} {n Factor}}}}}
184                          Factor     {x {n Term} {* {x {n AddOp} {n Term}}}}
185                          MulOp      {/ {t *} {t /}}
186                          Number     {x {? {n Sign}} {+ {n Digit}}}
187                          Sign       {/ {t -} {t +}}
188                          Term       {n Number}
189                      }
190                      return
191                  }
192
193                  component myg
194                  delegate method * to myg
195              }
196
197

PEG SERIALIZATION FORMAT

199       Here we specify the format used by the Parser Tools to serialize  Pars‐
200       ing  Expression Grammars as immutable values for transport, comparison,
201       etc.
202
203       We distinguish between regular and canonical serializations.   While  a
204       PEG  may  have  more than one regular serialization only exactly one of
205       them will be canonical.
206
207       regular serialization
208
209              [1]    The serialization of any PEG is a nested Tcl dictionary.
210
211              [2]    This dictionary holds a single key, pt::grammar::peg, and
212                     its value. This value holds the contents of the grammar.
213
214              [3]    The  contents of the grammar are a Tcl dictionary holding
215                     the set of nonterminal symbols and the  starting  expres‐
216                     sion. The relevant keys and their values are
217
218                     rules  The  value  is a Tcl dictionary whose keys are the
219                            names of the  nonterminal  symbols  known  to  the
220                            grammar.
221
222                            [1]    Each  nonterminal  symbol  may  occur  only
223                                   once.
224
225                            [2]    The empty string is not a legal nonterminal
226                                   symbol.
227
228                            [3]    The  value for each symbol is a Tcl dictio‐
229                                   nary itself. The relevant  keys  and  their
230                                   values in this dictionary are
231
232                                   is     The  value  is  the serialization of
233                                          the  parsing  expression  describing
234                                          the symbols sentennial structure, as
235                                          specified in the section PE  serial‐
236                                          ization format.
237
238                                   mode   The value can be one of three values
239                                          specifying how a parser should  han‐
240                                          dle  the  semantic value produced by
241                                          the symbol.
242
243                                          value  The  semantic  value  of  the
244                                                 nonterminal   symbol   is  an
245                                                 abstract syntax tree consist‐
246                                                 ing of a single node node for
247                                                 the nonterminal itself, which
248                                                 has  the ASTs of the symbol's
249                                                 right hand side as its  chil‐
250                                                 dren.
251
252                                          leaf   The  semantic  value  of  the
253                                                 nonterminal  symbol   is   an
254                                                 abstract syntax tree consist‐
255                                                 ing of a single node node for
256                                                 the  nonterminal, without any
257                                                 children. Any ASTs  generated
258                                                 by  the  symbol's  right hand
259                                                 side are discarded.
260
261                                          void   The nonterminal has no seman‐
262                                                 tic value. Any ASTs generated
263                                                 by the  symbol's  right  hand
264                                                 side are discarded (as well).
265
266                     start  The  value is the serialization of the start pars‐
267                            ing expression of the grammar, as specified in the
268                            section PE serialization format.
269
270              [4]    The terminal symbols of the grammar are specified implic‐
271                     itly as the set of all terminal symbols used in the start
272                     expression and on the RHS of the grammar rules.
273
274       canonical serialization
275              The canonical serialization of a grammar has the format as spec‐
276              ified in the previous item, and then additionally satisfies  the
277              constraints  below,  which make it unique among all the possible
278              serializations of this grammar.
279
280              [1]    The keys found in all the  nested  Tcl  dictionaries  are
281                     sorted  in  ascending  dictionary  order, as generated by
282                     Tcl's builtin command lsort -increasing -dict.
283
284              [2]    The string representation of the value is  the  canonical
285                     representation of a Tcl dictionary. I.e. it does not con‐
286                     tain superfluous whitespace.
287
288   EXAMPLE
289       Assuming the following PEG for simple mathematical expressions
290
291              PEG calculator (Expression)
292                  Digit      <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9'       ;
293                  Sign       <- '-' / '+'                                     ;
294                  Number     <- Sign? Digit+                                  ;
295                  Expression <- Term (AddOp Term)*                            ;
296                  MulOp      <- '*' / '/'                                     ;
297                  Term       <- Factor (MulOp Factor)*                        ;
298                  AddOp      <- '+'/'-'                                       ;
299                  Factor     <- '(' Expression ')' / Number                   ;
300              END;
301
302
303       then its canonical serialization (except for whitespace) is
304
305              pt::grammar::peg {
306                  rules {
307                      AddOp      {is {/ {t -} {t +}}                                                                mode value}
308                      Digit      {is {/ {t 0} {t 1} {t 2} {t 3} {t 4} {t 5} {t 6} {t 7} {t 8} {t 9}}                mode value}
309                      Expression {is {x {n Term} {* {x {n AddOp} {n Term}}}}                                        mode value}
310                      Factor     {is {/ {x {t (} {n Expression} {t )}} {n Number}}                                  mode value}
311                      MulOp      {is {/ {t *} {t /}}                                                                mode value}
312                      Number     {is {x {? {n Sign}} {+ {n Digit}}}                                                 mode value}
313                      Sign       {is {/ {t -} {t +}}                                                                mode value}
314                      Term       {is {x {n Factor} {* {x {n MulOp} {n Factor}}}}                                    mode value}
315                  }
316                  start {n Expression}
317              }
318
319

PE SERIALIZATION FORMAT

321       Here we specify the format used by the Parser Tools to serialize  Pars‐
322       ing Expressions as immutable values for transport, comparison, etc.
323
324       We  distinguish  between regular and canonical serializations.  While a
325       parsing expression may have more than one  regular  serialization  only
326       exactly one of them will be canonical.
327
328       Regular serialization
329
330              Atomic Parsing Expressions
331
332                     [1]    The  string  epsilon  is an atomic parsing expres‐
333                            sion. It matches the empty string.
334
335                     [2]    The string dot is an atomic parsing expression. It
336                            matches any character.
337
338                     [3]    The  string alnum is an atomic parsing expression.
339                            It matches any Unicode alphabet or  digit  charac‐
340                            ter.  This  is  a custom extension of PEs based on
341                            Tcl's builtin command string is.
342
343                     [4]    The string alpha is an atomic parsing  expression.
344                            It matches any Unicode alphabet character. This is
345                            a custom extension of PEs based on  Tcl's  builtin
346                            command string is.
347
348                     [5]    The  string ascii is an atomic parsing expression.
349                            It matches any Unicode character below U0080. This
350                            is  a  custom  extension  of  PEs  based  on Tcl's
351                            builtin command string is.
352
353                     [6]    The string control is an  atomic  parsing  expres‐
354                            sion.  It  matches  any Unicode control character.
355                            This is a custom extension of PEs based  on  Tcl's
356                            builtin command string is.
357
358                     [7]    The  string digit is an atomic parsing expression.
359                            It matches any Unicode digit character. Note  that
360                            this  includes  characters  outside  of the [0..9]
361                            range. This is a custom extension of PEs based  on
362                            Tcl's builtin command string is.
363
364                     [8]    The  string graph is an atomic parsing expression.
365                            It matches any Unicode printing character,  except
366                            for space. This is a custom extension of PEs based
367                            on Tcl's builtin command string is.
368
369                     [9]    The string lower is an atomic parsing  expression.
370                            It matches any Unicode lower-case alphabet charac‐
371                            ter. This is a custom extension of  PEs  based  on
372                            Tcl's builtin command string is.
373
374                     [10]   The  string print is an atomic parsing expression.
375                            It matches any Unicode printing character, includ‐
376                            ing space. This is a custom extension of PEs based
377                            on Tcl's builtin command string is.
378
379                     [11]   The string punct is an atomic parsing  expression.
380                            It matches any Unicode punctuation character. This
381                            is a  custom  extension  of  PEs  based  on  Tcl's
382                            builtin command string is.
383
384                     [12]   The  string space is an atomic parsing expression.
385                            It matches any Unicode space character. This is  a
386                            custom  extension  of  PEs  based on Tcl's builtin
387                            command string is.
388
389                     [13]   The string upper is an atomic parsing  expression.
390                            It matches any Unicode upper-case alphabet charac‐
391                            ter. This is a custom extension of  PEs  based  on
392                            Tcl's builtin command string is.
393
394                     [14]   The  string  wordchar is an atomic parsing expres‐
395                            sion. It matches any Unicode word character.  This
396                            is any alphanumeric character (see alnum), and any
397                            connector  punctuation  characters  (e.g.   under‐
398                            score). This is a custom extension of PEs based on
399                            Tcl's builtin command string is.
400
401                     [15]   The string xdigit is an atomic parsing expression.
402                            It  matches  any hexadecimal digit character. This
403                            is a  custom  extension  of  PEs  based  on  Tcl's
404                            builtin command string is.
405
406                     [16]   The string ddigit is an atomic parsing expression.
407                            It matches any decimal digit character. This is  a
408                            custom  extension  of  PEs  based on Tcl's builtin
409                            command regexp.
410
411                     [17]   The expression [list t x]  is  an  atomic  parsing
412                            expression. It matches the terminal string x.
413
414                     [18]   The  expression  [list  n  A] is an atomic parsing
415                            expression. It matches the nonterminal A.
416
417              Combined Parsing Expressions
418
419                     [1]    For parsing expressions e1, e2, ... the result  of
420                            [list  /  e1  e2  ... ] is a parsing expression as
421                            well.  This is the ordered choice, aka prioritized
422                            choice.
423
424                     [2]    For  parsing expressions e1, e2, ... the result of
425                            [list x e1 e2 ... ] is  a  parsing  expression  as
426                            well.  This is the sequence.
427
428                     [3]    For  a  parsing expression e the result of [list *
429                            e] is a parsing expression as well.  This  is  the
430                            kleene  closure,  describing  zero or more repeti‐
431                            tions.
432
433                     [4]    For a parsing expression e the result of  [list  +
434                            e]  is  a parsing expression as well.  This is the
435                            positive kleene closure, describing  one  or  more
436                            repetitions.
437
438                     [5]    For  a  parsing expression e the result of [list &
439                            e] is a parsing expression as well.  This  is  the
440                            and lookahead predicate.
441
442                     [6]    For  a  parsing expression e the result of [list !
443                            e] is a parsing expression as well.  This  is  the
444                            not lookahead predicate.
445
446                     [7]    For  a  parsing expression e the result of [list ?
447                            e] is a parsing expression as well.  This  is  the
448                            optional input.
449
450       Canonical serialization
451              The canonical serialization of a parsing expression has the for‐
452              mat as specified in the previous  item,  and  then  additionally
453              satisfies  the constraints below, which make it unique among all
454              the possible serializations of this parsing expression.
455
456              [1]    The string representation of the value is  the  canonical
457                     representation  of a pure Tcl list. I.e. it does not con‐
458                     tain superfluous whitespace.
459
460              [2]    Terminals are not encoded as ranges (where start and  end
461                     of the range are identical).
462
463   EXAMPLE
464       Assuming  the  parsing  expression  shown on the right-hand side of the
465       rule
466
467                  Expression <- Term (AddOp Term)*
468
469
470       then its canonical serialization (except for whitespace) is
471
472                  {x {n Term} {* {x {n AddOp} {n Term}}}}
473
474

BUGS, IDEAS, FEEDBACK

476       This document, and the package it describes, will  undoubtedly  contain
477       bugs  and other problems.  Please report such in the category pt of the
478       Tcllib Trackers  [http://core.tcl.tk/tcllib/reportlist].   Please  also
479       report  any  ideas  for  enhancements  you  may have for either package
480       and/or documentation.
481
482       When proposing code changes, please provide unified diffs, i.e the out‐
483       put of diff -u.
484
485       Note  further  that  attachments  are  strongly  preferred over inlined
486       patches. Attachments can be made by going  to  the  Edit  form  of  the
487       ticket  immediately  after  its  creation, and then using the left-most
488       button in the secondary navigation bar.
489

KEYWORDS

491       CONTAINER, EBNF, LL(k), PEG, TDPL, context-free languages,  conversion,
492       expression,  format  conversion,  grammar,  matching,  parser,  parsing
493       expression, parsing expression grammar, push down automaton,  recursive
494       descent, serialization, state, top-down parsing languages, transducer
495

CATEGORY

497       Parsing and Grammars
498
500       Copyright (c) 2009 Andreas Kupries <andreas_kupries@users.sourceforge.net>
501
502
503
504
505tcllib                                 1             pt::peg::to::container(n)
Impressum