pt_peg_to_peg(n)

1pt::peg::to::peg(n)              Parser Tools              pt::peg::to::peg(n)
2
3
4
5______________________________________________________________________________
6

NAME

8       pt::peg::to::peg - PEG Conversion. Write PEG format
9

SYNOPSIS

11       package require Tcl  8.5
12
13       package require pt::peg::to::peg  ?1.0.2?
14
15       package require pt::peg
16
17       package require pt::pe
18
19       package require text::write
20
21       pt::peg::to::peg reset
22
23       pt::peg::to::peg configure
24
25       pt::peg::to::peg configure option
26
27       pt::peg::to::peg configure option value...
28
29       pt::peg::to::peg convert serial
30
31______________________________________________________________________________
32

DESCRIPTION

34       Are  you  lost ?  Do you have trouble understanding this document ?  In
35       that case please read the overview  provided  by  the  Introduction  to
36       Parser  Tools.  This document is the entrypoint to the whole system the
37       current package is a part of.
38
39       This package implements the converter from parsing expression  grammars
40       to PEG markup.
41
42       It resides in the Export section of the Core Layer of Parser Tools, and
43       can be used either directly with the other packages of this  layer,  or
44       indirectly  through the export manager provided by pt::peg::export. The
45       latter is intented for use in untrusted environments and  done  through
46       the  corresponding  export  plugin pt::peg::export::peg sitting between
47       converter and export manager.
48
49       IMAGE: arch_core_eplugins
50

API

52       The API provided by this package satisfies  the  specification  of  the
53       Converter API found in the Parser Tools Export API specification.
54
55       pt::peg::to::peg reset
56              This  command resets the configuration of the package to its de‐
57              fault settings.
58
59       pt::peg::to::peg configure
60              This command returns a dictionary containing the current config‐
61              uration of the package.
62
63       pt::peg::to::peg configure option
64              This command returns the current value of the specified configu‐
65              ration option of the package. For  the  set  of  legal  options,
66              please read the section Options.
67
68       pt::peg::to::peg configure option value...
69              This  command  sets the given configuration options of the pack‐
70              age, to the specified values. For  the  set  of  legal  options,
71              please read the section Options.
72
73       pt::peg::to::peg convert serial
74              This  command takes the canonical serialization of a parsing ex‐
75              pression grammar, as specified in section PEG serialization for‐
76              mat,  and contained in serial, and generates PEG markup encoding
77              the grammar, per the current package configuration.  The created
78              string is then returned as the result of the command.
79

OPTIONS

81       The  converter to the PEG language recognizes the following options and
82       changes its behaviour as they specify.
83
84       -file string
85              The value of this option is the name of the file or other entity
86              from  which  the grammar came, for which the command is run. The
87              default value is unknown.
88
89       -name string
90              The value of this option is the name of the grammar we are  pro‐
91              cessing.  The default value is a_pe_grammar.
92
93       -user string
94              The  value  of this option is the name of the user for which the
95              command is run. The default value is unknown.
96
97       -template string
98              The value of this option is a string into which to put the  gen‐
99              erated text and the values of the other options. The various lo‐
100              cations for user-data are expected  to  be  specified  with  the
101              placeholders listed below. The default value is "@code@".
102
103              @user@ To be replaced with the value of the option -user.
104
105              @format@
106                     To be replaced with the the constant PEG.
107
108              @file@ To be replaced with the value of the option -file.
109
110              @name@ To be replaced with the value of the option -name.
111
112              @code@ To be replaced with the generated text.
113

PEG SPECIFICATION LANGUAGE

115       peg, a language for the specification of parsing expression grammars is
116       meant to be human readable, and writable as well, yet strict enough  to
117       allow its processing by machine. Like any computer language. It was de‐
118       fined to make writing the specification of a  grammar  easy,  something
119       the other formats found in the Parser Tools do not lend themselves too.
120
121       It is formally specified by the grammar shown below, written in itself.
122       For a tutorial / introduction to the language please go  and  read  the
123       PEG Language Tutorial.
124
125              PEG pe-grammar-for-peg (Grammar)
126
127                # --------------------------------------------------------------------
128                      # Syntactical constructs
129
130                      Grammar         <- WHITESPACE Header Definition* Final EOF ;
131
132                      Header          <- PEG Identifier StartExpr ;
133                      Definition      <- Attribute? Identifier IS Expression SEMICOLON ;
134                      Attribute       <- (VOID / LEAF) COLON ;
135                      Expression      <- Sequence (SLASH Sequence)* ;
136                      Sequence        <- Prefix+ ;
137                      Prefix          <- (AND / NOT)? Suffix ;
138                      Suffix          <- Primary (QUESTION / STAR / PLUS)? ;
139                      Primary         <- ALNUM / ALPHA / ASCII / CONTROL / DDIGIT / DIGIT
140                                      /  GRAPH / LOWER / PRINTABLE / PUNCT / SPACE / UPPER
141                                      /  WORDCHAR / XDIGIT
142                                      / Identifier
143                                      /  OPEN Expression CLOSE
144                                      /  Literal
145                                      /  Class
146                                      /  DOT
147                                      ;
148                      Literal         <- APOSTROPH  (!APOSTROPH  Char)* APOSTROPH  WHITESPACE
149                                      /  DAPOSTROPH (!DAPOSTROPH Char)* DAPOSTROPH WHITESPACE ;
150                      Class           <- OPENB (!CLOSEB Range)* CLOSEB WHITESPACE ;
151                      Range           <- Char TO Char / Char ;
152
153                      StartExpr       <- OPEN Expression CLOSE ;
154              void:   Final           <- "END" WHITESPACE SEMICOLON WHITESPACE ;
155
156                      # --------------------------------------------------------------------
157                      # Lexing constructs
158
159                      Identifier      <- Ident WHITESPACE ;
160              leaf:   Ident           <- ([_:] / <alpha>) ([_:] / <alnum>)* ;
161                      Char            <- CharSpecial / CharOctalFull / CharOctalPart
162                                      /  CharUnicode / CharUnescaped
163                                      ;
164
165              leaf:   CharSpecial     <- "\\" [nrt'"\[\]\\] ;
166              leaf:   CharOctalFull   <- "\\" [0-2][0-7][0-7] ;
167              leaf:   CharOctalPart   <- "\\" [0-7][0-7]? ;
168              leaf:   CharUnicode     <- "\\" 'u' HexDigit (HexDigit (HexDigit HexDigit?)?)? ;
169              leaf:   CharUnescaped   <- !"\\" . ;
170
171              void:   HexDigit        <- [0-9a-fA-F] ;
172
173              void:   TO              <- '-'           ;
174              void:   OPENB           <- "["           ;
175              void:   CLOSEB          <- "]"           ;
176              void:   APOSTROPH       <- "'"           ;
177              void:   DAPOSTROPH      <- '"'           ;
178              void:   PEG             <- "PEG" !([_:] / <alnum>) WHITESPACE ;
179              void:   IS              <- "<-"    WHITESPACE ;
180              leaf:   VOID            <- "void"  WHITESPACE ; # Implies that definition has no semantic value.
181              leaf:   LEAF            <- "leaf"  WHITESPACE ; # Implies that definition has no terminals.
182              void:   SEMICOLON       <- ";"     WHITESPACE ;
183              void:   COLON           <- ":"     WHITESPACE ;
184              void:   SLASH           <- "/"     WHITESPACE ;
185              leaf:   AND             <- "&"     WHITESPACE ;
186              leaf:   NOT             <- "!"     WHITESPACE ;
187              leaf:   QUESTION        <- "?"     WHITESPACE ;
188              leaf:   STAR            <- "*"     WHITESPACE ;
189              leaf:   PLUS            <- "+"     WHITESPACE ;
190              void:   OPEN            <- "("     WHITESPACE ;
191              void:   CLOSE           <- ")"     WHITESPACE ;
192              leaf:   DOT             <- "."     WHITESPACE ;
193
194              leaf:   ALNUM           <- "<alnum>"    WHITESPACE ;
195              leaf:   ALPHA           <- "<alpha>"    WHITESPACE ;
196              leaf:   ASCII           <- "<ascii>"    WHITESPACE ;
197              leaf:   CONTROL         <- "<control>"  WHITESPACE ;
198              leaf:   DDIGIT          <- "<ddigit>"   WHITESPACE ;
199              leaf:   DIGIT           <- "<digit>"    WHITESPACE ;
200              leaf:   GRAPH           <- "<graph>"    WHITESPACE ;
201              leaf:   LOWER           <- "<lower>"    WHITESPACE ;
202              leaf:   PRINTABLE       <- "<print>"    WHITESPACE ;
203              leaf:   PUNCT           <- "<punct>"    WHITESPACE ;
204              leaf:   SPACE           <- "<space>"    WHITESPACE ;
205              leaf:   UPPER           <- "<upper>"    WHITESPACE ;
206              leaf:   WORDCHAR        <- "<wordchar>" WHITESPACE ;
207              leaf:   XDIGIT          <- "<xdigit>"   WHITESPACE ;
208
209              void:   WHITESPACE      <- (" " / "\t" / EOL / COMMENT)* ;
210              void:   COMMENT         <- '#' (!EOL .)* EOL ;
211              void:   EOL             <- "\n\r" / "\n" / "\r" ;
212              void:   EOF             <- !. ;
213
214                      # --------------------------------------------------------------------
215              END;
216
217
218   EXAMPLE
219       Our example specifies the grammar for a basic 4-operation calculator.
220
221              PEG calculator (Expression)
222                  Digit      <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9'       ;
223                  Sign       <- '-' / '+'                                     ;
224                  Number     <- Sign? Digit+                                  ;
225                  Expression <- Term (AddOp Term)*                            ;
226                  MulOp      <- '*' / '/'                                     ;
227                  Term       <- Factor (MulOp Factor)*                        ;
228                  AddOp      <- '+'/'-'                                       ;
229                  Factor     <- '(' Expression ')' / Number                   ;
230              END;
231
232
233       Using higher-level features of the notation, i.e. the character classes
234       (predefined and custom), this example can be rewritten as
235
236              PEG calculator (Expression)
237                  Sign       <- [-+] ;
238                  Number     <- Sign? <ddigit>+;
239                  Expression <- '(' Expression ')' / (Factor (MulOp Factor)*);
240                  MulOp      <- [*/];
241                  Factor     <- Term (AddOp Term)*;
242                  AddOp      <- [-+];
243                  Term       <- Number;
244              END;
245
246

PEG SERIALIZATION FORMAT

248       Here we specify the format used by the Parser Tools to serialize  Pars‐
249       ing  Expression Grammars as immutable values for transport, comparison,
250       etc.
251
252       We distinguish between regular and canonical serializations.   While  a
253       PEG  may  have  more than one regular serialization only exactly one of
254       them will be canonical.
255
256       regular serialization
257
258              [1]    The serialization of any PEG is a nested Tcl dictionary.
259
260              [2]    This dictionary holds a single key, pt::grammar::peg, and
261                     its value. This value holds the contents of the grammar.
262
263              [3]    The  contents of the grammar are a Tcl dictionary holding
264                     the set of nonterminal symbols and the  starting  expres‐
265                     sion. The relevant keys and their values are
266
267                     rules  The  value  is a Tcl dictionary whose keys are the
268                            names of the  nonterminal  symbols  known  to  the
269                            grammar.
270
271                            [1]    Each  nonterminal  symbol  may  occur  only
272                                   once.
273
274                            [2]    The empty string is not a legal nonterminal
275                                   symbol.
276
277                            [3]    The  value for each symbol is a Tcl dictio‐
278                                   nary itself. The relevant  keys  and  their
279                                   values in this dictionary are
280
281                                   is     The  value  is  the serialization of
282                                          the  parsing  expression  describing
283                                          the symbols sentennial structure, as
284                                          specified in the section PE  serial‐
285                                          ization format.
286
287                                   mode   The value can be one of three values
288                                          specifying how a parser should  han‐
289                                          dle  the  semantic value produced by
290                                          the symbol.
291
292                                          value  The  semantic  value  of  the
293                                                 nonterminal  symbol is an ab‐
294                                                 stract syntax tree consisting
295                                                 of a single node node for the
296                                                 nonterminal itself, which has
297                                                 the   ASTs  of  the  symbol's
298                                                 right hand side as its  chil‐
299                                                 dren.
300
301                                          leaf   The  semantic  value  of  the
302                                                 nonterminal symbol is an  ab‐
303                                                 stract syntax tree consisting
304                                                 of a single node node for the
305                                                 nonterminal,    without   any
306                                                 children. Any ASTs  generated
307                                                 by  the  symbol's  right hand
308                                                 side are discarded.
309
310                                          void   The nonterminal has no seman‐
311                                                 tic value. Any ASTs generated
312                                                 by the  symbol's  right  hand
313                                                 side are discarded (as well).
314
315                     start  The  value is the serialization of the start pars‐
316                            ing expression of the grammar, as specified in the
317                            section PE serialization format.
318
319              [4]    The terminal symbols of the grammar are specified implic‐
320                     itly as the set of all terminal symbols used in the start
321                     expression and on the RHS of the grammar rules.
322
323       canonical serialization
324              The canonical serialization of a grammar has the format as spec‐
325              ified in the previous item, and then additionally satisfies  the
326              constraints  below,  which make it unique among all the possible
327              serializations of this grammar.
328
329              [1]    The keys found in all the  nested  Tcl  dictionaries  are
330                     sorted  in  ascending  dictionary  order, as generated by
331                     Tcl's builtin command lsort -increasing -dict.
332
333              [2]    The string representation of the value is  the  canonical
334                     representation of a Tcl dictionary. I.e. it does not con‐
335                     tain superfluous whitespace.
336
337   EXAMPLE
338       Assuming the following PEG for simple mathematical expressions
339
340              PEG calculator (Expression)
341                  Digit      <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9'       ;
342                  Sign       <- '-' / '+'                                     ;
343                  Number     <- Sign? Digit+                                  ;
344                  Expression <- Term (AddOp Term)*                            ;
345                  MulOp      <- '*' / '/'                                     ;
346                  Term       <- Factor (MulOp Factor)*                        ;
347                  AddOp      <- '+'/'-'                                       ;
348                  Factor     <- '(' Expression ')' / Number                   ;
349              END;
350
351
352       then its canonical serialization (except for whitespace) is
353
354              pt::grammar::peg {
355                  rules {
356                      AddOp      {is {/ {t -} {t +}}                                                                mode value}
357                      Digit      {is {/ {t 0} {t 1} {t 2} {t 3} {t 4} {t 5} {t 6} {t 7} {t 8} {t 9}}                mode value}
358                      Expression {is {x {n Term} {* {x {n AddOp} {n Term}}}}                                        mode value}
359                      Factor     {is {/ {x {t (} {n Expression} {t )}} {n Number}}                                  mode value}
360                      MulOp      {is {/ {t *} {t /}}                                                                mode value}
361                      Number     {is {x {? {n Sign}} {+ {n Digit}}}                                                 mode value}
362                      Sign       {is {/ {t -} {t +}}                                                                mode value}
363                      Term       {is {x {n Factor} {* {x {n MulOp} {n Factor}}}}                                    mode value}
364                  }
365                  start {n Expression}
366              }
367
368

PE SERIALIZATION FORMAT

370       Here we specify the format used by the Parser Tools to serialize  Pars‐
371       ing Expressions as immutable values for transport, comparison, etc.
372
373       We  distinguish  between regular and canonical serializations.  While a
374       parsing expression may have more than one  regular  serialization  only
375       exactly one of them will be canonical.
376
377       Regular serialization
378
379              Atomic Parsing Expressions
380
381                     [1]    The  string  epsilon  is an atomic parsing expres‐
382                            sion. It matches the empty string.
383
384                     [2]    The string dot is an atomic parsing expression. It
385                            matches any character.
386
387                     [3]    The  string alnum is an atomic parsing expression.
388                            It matches any Unicode alphabet or  digit  charac‐
389                            ter.  This  is  a custom extension of PEs based on
390                            Tcl's builtin command string is.
391
392                     [4]    The string alpha is an atomic parsing  expression.
393                            It matches any Unicode alphabet character. This is
394                            a custom extension of PEs based on  Tcl's  builtin
395                            command string is.
396
397                     [5]    The  string ascii is an atomic parsing expression.
398                            It matches any Unicode character below U0080. This
399                            is  a  custom  extension  of  PEs  based  on Tcl's
400                            builtin command string is.
401
402                     [6]    The string control is an  atomic  parsing  expres‐
403                            sion.  It  matches  any Unicode control character.
404                            This is a custom extension of PEs based  on  Tcl's
405                            builtin command string is.
406
407                     [7]    The  string digit is an atomic parsing expression.
408                            It matches any Unicode digit character. Note  that
409                            this  includes  characters  outside  of the [0..9]
410                            range. This is a custom extension of PEs based  on
411                            Tcl's builtin command string is.
412
413                     [8]    The  string graph is an atomic parsing expression.
414                            It matches any Unicode printing character,  except
415                            for space. This is a custom extension of PEs based
416                            on Tcl's builtin command string is.
417
418                     [9]    The string lower is an atomic parsing  expression.
419                            It matches any Unicode lower-case alphabet charac‐
420                            ter. This is a custom extension of  PEs  based  on
421                            Tcl's builtin command string is.
422
423                     [10]   The  string print is an atomic parsing expression.
424                            It matches any Unicode printing character, includ‐
425                            ing space. This is a custom extension of PEs based
426                            on Tcl's builtin command string is.
427
428                     [11]   The string punct is an atomic parsing  expression.
429                            It matches any Unicode punctuation character. This
430                            is a  custom  extension  of  PEs  based  on  Tcl's
431                            builtin command string is.
432
433                     [12]   The  string space is an atomic parsing expression.
434                            It matches any Unicode space character. This is  a
435                            custom  extension  of  PEs  based on Tcl's builtin
436                            command string is.
437
438                     [13]   The string upper is an atomic parsing  expression.
439                            It matches any Unicode upper-case alphabet charac‐
440                            ter. This is a custom extension of  PEs  based  on
441                            Tcl's builtin command string is.
442
443                     [14]   The  string  wordchar is an atomic parsing expres‐
444                            sion. It matches any Unicode word character.  This
445                            is any alphanumeric character (see alnum), and any
446                            connector  punctuation  characters  (e.g.   under‐
447                            score). This is a custom extension of PEs based on
448                            Tcl's builtin command string is.
449
450                     [15]   The string xdigit is an atomic parsing expression.
451                            It  matches  any hexadecimal digit character. This
452                            is a  custom  extension  of  PEs  based  on  Tcl's
453                            builtin command string is.
454
455                     [16]   The string ddigit is an atomic parsing expression.
456                            It matches any decimal digit character. This is  a
457                            custom  extension  of  PEs  based on Tcl's builtin
458                            command regexp.
459
460                     [17]   The expression [list t x] is an atomic parsing ex‐
461                            pression. It matches the terminal string x.
462
463                     [18]   The expression [list n A] is an atomic parsing ex‐
464                            pression. It matches the nonterminal A.
465
466              Combined Parsing Expressions
467
468                     [1]    For parsing expressions e1, e2, ... the result  of
469                            [list  /  e1  e2  ... ] is a parsing expression as
470                            well.  This is the ordered choice, aka prioritized
471                            choice.
472
473                     [2]    For  parsing expressions e1, e2, ... the result of
474                            [list x e1 e2 ... ] is  a  parsing  expression  as
475                            well.  This is the sequence.
476
477                     [3]    For  a  parsing expression e the result of [list *
478                            e] is a parsing expression as well.  This  is  the
479                            kleene  closure,  describing  zero or more repeti‐
480                            tions.
481
482                     [4]    For a parsing expression e the result of  [list  +
483                            e]  is  a parsing expression as well.  This is the
484                            positive kleene closure, describing  one  or  more
485                            repetitions.
486
487                     [5]    For  a  parsing expression e the result of [list &
488                            e] is a parsing expression as well.  This  is  the
489                            and lookahead predicate.
490
491                     [6]    For  a  parsing expression e the result of [list !
492                            e] is a parsing expression as well.  This  is  the
493                            not lookahead predicate.
494
495                     [7]    For  a  parsing expression e the result of [list ?
496                            e] is a parsing expression as well.  This  is  the
497                            optional input.
498
499       Canonical serialization
500              The canonical serialization of a parsing expression has the for‐
501              mat as specified in the previous  item,  and  then  additionally
502              satisfies  the constraints below, which make it unique among all
503              the possible serializations of this parsing expression.
504
505              [1]    The string representation of the value is  the  canonical
506                     representation  of a pure Tcl list. I.e. it does not con‐
507                     tain superfluous whitespace.
508
509              [2]    Terminals are not encoded as ranges (where start and  end
510                     of the range are identical).
511
512   EXAMPLE
513       Assuming  the  parsing  expression  shown on the right-hand side of the
514       rule
515
516                  Expression <- Term (AddOp Term)*
517
518
519       then its canonical serialization (except for whitespace) is
520
521                  {x {n Term} {* {x {n AddOp} {n Term}}}}
522
523

BUGS, IDEAS, FEEDBACK

525       This document, and the package it describes, will  undoubtedly  contain
526       bugs  and other problems.  Please report such in the category pt of the
527       Tcllib Trackers  [http://core.tcl.tk/tcllib/reportlist].   Please  also
528       report  any  ideas  for  enhancements  you  may have for either package
529       and/or documentation.
530
531       When proposing code changes, please provide unified diffs, i.e the out‐
532       put of diff -u.
533
534       Note  further  that  attachments  are  strongly  preferred over inlined
535       patches. Attachments can be made by going  to  the  Edit  form  of  the
536       ticket  immediately  after  its  creation, and then using the left-most
537       button in the secondary navigation bar.
538

KEYWORDS

540       EBNF, LL(k), PEG, TDPL, context-free languages, conversion, expression,
541       format conversion, grammar, matching, parser, parsing expression, pars‐
542       ing expression grammar, push down automaton, recursive descent, serial‐
543       ization, state, top-down parsing languages, transducer
544

COPYRIGHT

549       Copyright (c) 2009 Andreas Kupries <andreas_kupries@users.sourceforge.net>
550
551
552
553
554tcllib                               1.0.2                 pt::peg::to::peg(n)