1pt::peg::to::cparam(n)           Parser Tools           pt::peg::to::cparam(n)
2
3
4
5______________________________________________________________________________
6

NAME

8       pt::peg::to::cparam - PEG Conversion. Write CPARAM format
9

SYNOPSIS

11       package require Tcl  8.5
12
13       package require pt::peg::to::cparam  ?1.1.2?
14
15       pt::peg::to::cparam reset
16
17       pt::peg::to::cparam configure
18
19       pt::peg::to::cparam configure option
20
21       pt::peg::to::cparam configure option value...
22
23       pt::peg::to::cparam convert serial
24
25______________________________________________________________________________
26

DESCRIPTION

28       Are  you  lost ?  Do you have trouble understanding this document ?  In
29       that case please read the overview  provided  by  the  Introduction  to
30       Parser  Tools.  This document is the entrypoint to the whole system the
31       current package is a part of.
32
33       This package implements the converter from parsing expression  grammars
34       to CPARAM markup.
35
36       It resides in the Export section of the Core Layer of Parser Tools, and
37       can be used either directly with the other packages of this  layer,  or
38       indirectly  through the export manager provided by pt::peg::export. The
39       latter is intented for use in untrusted environments and  done  through
40       the corresponding export plugin pt::peg::export::cparam sitting between
41       converter and export manager.
42
43       IMAGE: arch_core_eplugins
44

API

46       The API provided by this package satisfies  the  specification  of  the
47       Converter API found in the Parser Tools Export API specification.
48
49       pt::peg::to::cparam reset
50              This  command  resets  the  configuration  of the package to its
51              default settings.
52
53       pt::peg::to::cparam configure
54              This command returns a dictionary containing the current config‐
55              uration of the package.
56
57       pt::peg::to::cparam configure option
58              This command returns the current value of the specified configu‐
59              ration option of the package. For  the  set  of  legal  options,
60              please read the section Options.
61
62       pt::peg::to::cparam configure option value...
63              This  command  sets the given configuration options of the pack‐
64              age, to the specified values. For  the  set  of  legal  options,
65              please read the section Options.
66
67       pt::peg::to::cparam convert serial
68              This  command  takes  the  canonical  serialization of a parsing
69              expression grammar, as specified in  section  PEG  serialization
70              format,  and  contained  in  serial, and generates CPARAM markup
71              encoding the grammar, per  the  current  package  configuration.
72              The  created  string  is then returned as the result of the com‐
73              mand.
74

OPTIONS

76       The converter to C code recognizes the  following  configuration  vari‐
77       ables and changes its behaviour as they specify.
78
79       -file string
80              The value of this option is the name of the file or other entity
81              from which the grammar came, for which the command is  run.  The
82              default value is unknown.
83
84       -name string
85              The  value of this option is the name of the grammar we are pro‐
86              cessing.  The default value is a_pe_grammar.
87
88       -user string
89              The value of this option is the name of the user for  which  the
90              command is run. The default value is unknown.
91
92       -template string
93              The  value of this option is a string into which to put the gen‐
94              erated text and the other configuration  settings.  The  various
95              locations  for  user-data  are expected to be specified with the
96              placeholders listed below. The default value is "@code@".
97
98              @user@ To be replaced with the value of the option -user.
99
100              @format@
101                     To be replaced with the the constant C/PARAM.
102
103              @file@ To be replaced with the value of the option -file.
104
105              @name@ To be replaced with the value of the option -name.
106
107              @code@ To be replaced with the generated Tcl code.
108
109              The following options are  special,  in  that  they  will  occur
110              within the generated code, and are replaced there as well.
111
112              @statedecl@
113                     To be replaced with the value of the option state-decl.
114
115              @stateref@
116                     To be replaced with the value of the option state-ref.
117
118              @strings@
119                     To  be  replaced with the value of the option string-var‐
120                     name.
121
122              @self@ To be replaced with the value of the option self-command.
123
124              @def@  To be replaced with the value of  the  option  fun-quali‐
125                     fier.
126
127              @ns@   To be replaced with the value of the option namespace.
128
129              @main@ To be replaced with the value of the option main.
130
131              @prelude@
132                     To be replaced with the value of the option prelude.
133
134       -state-decl string
135              A  C  string representing the argument declaration to use in the
136              generated parsing functions to refer to the  parsing  state.  In
137              essence type and argument name.  The default value is the string
138              RDE_PARAM p.
139
140       -state-ref string
141              A C string representing the argument named used in the generated
142              parsing  functions  to  refer to the parsing state.  The default
143              value is the string p.
144
145       -self-command string
146              A C string representing the reference needed to call the  gener‐
147              ated parser function (methods ...) from another parser fonction,
148              per the chosen framework (template).  The default value  is  the
149              empty string.
150
151       -fun-qualifier string
152              A  C  string  containing the attributes to give to the generated
153              functions (methods ...), per the  chosen  framework  (template).
154              The default value is static.
155
156       -namespace string
157              The  name of the C namespace the parser functions (methods, ...)
158              shall reside in, or a general prefix  to  add  to  the  function
159              names.  The default value is the empty string.
160
161       -main string
162              The  name of the main function (method, ...) to be called by the
163              chosen framework (template) to start parsing input.  The default
164              value is __main.
165
166       -string-varname string
167              The  name  of the variable used for the table of strings used by
168              the generated parser, i.e. error messages,  symbol  names,  etc.
169              The default value is p_string.
170
171       -prelude string
172              A  snippet  of code to be inserted at the head of each generated
173              parsing function.  The default value is the empty string.
174
175       -indent integer
176              The number of characters to indent each line  of  the  generated
177              code by.  The default value is 0.
178
179       -comments boolean
180              A  flag  controlling  the generation of code comments containing
181              the original parsing expression a parsing function is for.   The
182              default value is on.
183
184       While  the  high  parameterizability of this converter, as shown by the
185       multitude of options it supports, is an advantage to the advanced user,
186       allowing  her  to  customize  the  output of the converter as needed, a
187       novice user will likely not see the forest for the trees.
188
189       To help these latter users an adjunct package is provided, containing a
190       canned  configuration  which  will  generate  immediately  useful  full
191       parsers. It is
192
193       pt::cparam::configuration::critcl
194              Generated parsers are embedded into a Critcl-based framework.
195

C/PARAM CODE REPRESENTATION OF PARSING EXPRESSION GRAMMARS

197       The c format is executable code, a parser for the grammar.  The  parser
198       implementation  is  written in C and can be tweaked to the users' needs
199       through a multitude of options.
200
201       The critcl format, for example, is implemented as a  canned  configura‐
202       tion of these options on top of the generator for c.
203
204       The  bulk  of  such  a framework has to be specified through the option
205       -template. The additional options
206
207       -fun-qualifier string
208
209       -main string
210
211       -namespace string
212
213       -prelude string
214
215       -self-command string
216
217       -state-decl string
218
219       -state-ref string
220
221       -string-varname string
222
223       provide code snippets which help to glue framework and  generated  code
224       together.  Their  placeholders  are in the generated code.  Further the
225       options
226
227       -indent integer
228
229       -comments boolean
230
231       allow for the customization of the  code  indent  (default  none),  and
232       whether to generate comments showing the parsing expressions a function
233       is for (default on).
234
235   EXAMPLE
236       We are forgoing an example of this representation, with apologies.   It
237       would be way to large for this document.
238

PEG SERIALIZATION FORMAT

240       Here  we specify the format used by the Parser Tools to serialize Pars‐
241       ing Expression Grammars as immutable values for transport,  comparison,
242       etc.
243
244       We  distinguish  between regular and canonical serializations.  While a
245       PEG may have more than one regular serialization only  exactly  one  of
246       them will be canonical.
247
248       regular serialization
249
250              [1]    The serialization of any PEG is a nested Tcl dictionary.
251
252              [2]    This dictionary holds a single key, pt::grammar::peg, and
253                     its value. This value holds the contents of the grammar.
254
255              [3]    The contents of the grammar are a Tcl dictionary  holding
256                     the  set  of nonterminal symbols and the starting expres‐
257                     sion. The relevant keys and their values are
258
259                     rules  The value is a Tcl dictionary whose keys  are  the
260                            names  of  the  nonterminal  symbols  known to the
261                            grammar.
262
263                            [1]    Each  nonterminal  symbol  may  occur  only
264                                   once.
265
266                            [2]    The empty string is not a legal nonterminal
267                                   symbol.
268
269                            [3]    The value for each symbol is a Tcl  dictio‐
270                                   nary  itself.  The  relevant keys and their
271                                   values in this dictionary are
272
273                                   is     The value is  the  serialization  of
274                                          the  parsing  expression  describing
275                                          the symbols sentennial structure, as
276                                          specified  in the section PE serial‐
277                                          ization format.
278
279                                   mode   The value can be one of three values
280                                          specifying  how a parser should han‐
281                                          dle the semantic value  produced  by
282                                          the symbol.
283
284                                          value  The  semantic  value  of  the
285                                                 nonterminal  symbol   is   an
286                                                 abstract syntax tree consist‐
287                                                 ing of a single node node for
288                                                 the nonterminal itself, which
289                                                 has the ASTs of the  symbol's
290                                                 right  hand side as its chil‐
291                                                 dren.
292
293                                          leaf   The  semantic  value  of  the
294                                                 nonterminal   symbol   is  an
295                                                 abstract syntax tree consist‐
296                                                 ing of a single node node for
297                                                 the nonterminal, without  any
298                                                 children.  Any ASTs generated
299                                                 by the  symbol's  right  hand
300                                                 side are discarded.
301
302                                          void   The nonterminal has no seman‐
303                                                 tic value. Any ASTs generated
304                                                 by  the  symbol's  right hand
305                                                 side are discarded (as well).
306
307                     start  The value is the serialization of the start  pars‐
308                            ing expression of the grammar, as specified in the
309                            section PE serialization format.
310
311              [4]    The terminal symbols of the grammar are specified implic‐
312                     itly as the set of all terminal symbols used in the start
313                     expression and on the RHS of the grammar rules.
314
315       canonical serialization
316              The canonical serialization of a grammar has the format as spec‐
317              ified  in the previous item, and then additionally satisfies the
318              constraints below, which make it unique among all  the  possible
319              serializations of this grammar.
320
321              [1]    The  keys  found  in  all the nested Tcl dictionaries are
322                     sorted in ascending dictionary  order,  as  generated  by
323                     Tcl's builtin command lsort -increasing -dict.
324
325              [2]    The  string  representation of the value is the canonical
326                     representation of a Tcl dictionary. I.e. it does not con‐
327                     tain superfluous whitespace.
328
329   EXAMPLE
330       Assuming the following PEG for simple mathematical expressions
331
332              PEG calculator (Expression)
333                  Digit      <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9'       ;
334                  Sign       <- '-' / '+'                                     ;
335                  Number     <- Sign? Digit+                                  ;
336                  Expression <- Term (AddOp Term)*                            ;
337                  MulOp      <- '*' / '/'                                     ;
338                  Term       <- Factor (MulOp Factor)*                        ;
339                  AddOp      <- '+'/'-'                                       ;
340                  Factor     <- '(' Expression ')' / Number                   ;
341              END;
342
343
344       then its canonical serialization (except for whitespace) is
345
346              pt::grammar::peg {
347                  rules {
348                      AddOp      {is {/ {t -} {t +}}                                                                mode value}
349                      Digit      {is {/ {t 0} {t 1} {t 2} {t 3} {t 4} {t 5} {t 6} {t 7} {t 8} {t 9}}                mode value}
350                      Expression {is {x {n Term} {* {x {n AddOp} {n Term}}}}                                        mode value}
351                      Factor     {is {/ {x {t (} {n Expression} {t )}} {n Number}}                                  mode value}
352                      MulOp      {is {/ {t *} {t /}}                                                                mode value}
353                      Number     {is {x {? {n Sign}} {+ {n Digit}}}                                                 mode value}
354                      Sign       {is {/ {t -} {t +}}                                                                mode value}
355                      Term       {is {x {n Factor} {* {x {n MulOp} {n Factor}}}}                                    mode value}
356                  }
357                  start {n Expression}
358              }
359
360

PE SERIALIZATION FORMAT

362       Here  we specify the format used by the Parser Tools to serialize Pars‐
363       ing Expressions as immutable values for transport, comparison, etc.
364
365       We distinguish between regular and canonical serializations.   While  a
366       parsing  expression  may  have more than one regular serialization only
367       exactly one of them will be canonical.
368
369       Regular serialization
370
371              Atomic Parsing Expressions
372
373                     [1]    The string epsilon is an  atomic  parsing  expres‐
374                            sion. It matches the empty string.
375
376                     [2]    The string dot is an atomic parsing expression. It
377                            matches any character.
378
379                     [3]    The string alnum is an atomic parsing  expression.
380                            It  matches  any Unicode alphabet or digit charac‐
381                            ter. This is a custom extension of  PEs  based  on
382                            Tcl's builtin command string is.
383
384                     [4]    The  string alpha is an atomic parsing expression.
385                            It matches any Unicode alphabet character. This is
386                            a  custom  extension of PEs based on Tcl's builtin
387                            command string is.
388
389                     [5]    The string ascii is an atomic parsing  expression.
390                            It matches any Unicode character below U0080. This
391                            is a  custom  extension  of  PEs  based  on  Tcl's
392                            builtin command string is.
393
394                     [6]    The  string  control  is an atomic parsing expres‐
395                            sion. It matches any  Unicode  control  character.
396                            This  is  a custom extension of PEs based on Tcl's
397                            builtin command string is.
398
399                     [7]    The string digit is an atomic parsing  expression.
400                            It  matches any Unicode digit character. Note that
401                            this includes characters  outside  of  the  [0..9]
402                            range.  This is a custom extension of PEs based on
403                            Tcl's builtin command string is.
404
405                     [8]    The string graph is an atomic parsing  expression.
406                            It  matches any Unicode printing character, except
407                            for space. This is a custom extension of PEs based
408                            on Tcl's builtin command string is.
409
410                     [9]    The  string lower is an atomic parsing expression.
411                            It matches any Unicode lower-case alphabet charac‐
412                            ter.  This  is  a custom extension of PEs based on
413                            Tcl's builtin command string is.
414
415                     [10]   The string print is an atomic parsing  expression.
416                            It matches any Unicode printing character, includ‐
417                            ing space. This is a custom extension of PEs based
418                            on Tcl's builtin command string is.
419
420                     [11]   The  string punct is an atomic parsing expression.
421                            It matches any Unicode punctuation character. This
422                            is  a  custom  extension  of  PEs  based  on Tcl's
423                            builtin command string is.
424
425                     [12]   The string space is an atomic parsing  expression.
426                            It  matches any Unicode space character. This is a
427                            custom extension of PEs  based  on  Tcl's  builtin
428                            command string is.
429
430                     [13]   The  string upper is an atomic parsing expression.
431                            It matches any Unicode upper-case alphabet charac‐
432                            ter.  This  is  a custom extension of PEs based on
433                            Tcl's builtin command string is.
434
435                     [14]   The string wordchar is an atomic  parsing  expres‐
436                            sion.  It matches any Unicode word character. This
437                            is any alphanumeric character (see alnum), and any
438                            connector  punctuation  characters  (e.g.   under‐
439                            score). This is a custom extension of PEs based on
440                            Tcl's builtin command string is.
441
442                     [15]   The string xdigit is an atomic parsing expression.
443                            It matches any hexadecimal digit  character.  This
444                            is  a  custom  extension  of  PEs  based  on Tcl's
445                            builtin command string is.
446
447                     [16]   The string ddigit is an atomic parsing expression.
448                            It  matches any decimal digit character. This is a
449                            custom extension of PEs  based  on  Tcl's  builtin
450                            command regexp.
451
452                     [17]   The  expression  [list  t  x] is an atomic parsing
453                            expression. It matches the terminal string x.
454
455                     [18]   The expression [list n A]  is  an  atomic  parsing
456                            expression. It matches the nonterminal A.
457
458              Combined Parsing Expressions
459
460                     [1]    For  parsing expressions e1, e2, ... the result of
461                            [list / e1 e2 ... ] is  a  parsing  expression  as
462                            well.  This is the ordered choice, aka prioritized
463                            choice.
464
465                     [2]    For parsing expressions e1, e2, ... the result  of
466                            [list  x  e1  e2  ... ] is a parsing expression as
467                            well.  This is the sequence.
468
469                     [3]    For a parsing expression e the result of  [list  *
470                            e]  is  a parsing expression as well.  This is the
471                            kleene closure, describing zero  or  more  repeti‐
472                            tions.
473
474                     [4]    For  a  parsing expression e the result of [list +
475                            e] is a parsing expression as well.  This  is  the
476                            positive  kleene  closure,  describing one or more
477                            repetitions.
478
479                     [5]    For a parsing expression e the result of  [list  &
480                            e]  is  a parsing expression as well.  This is the
481                            and lookahead predicate.
482
483                     [6]    For a parsing expression e the result of  [list  !
484                            e]  is  a parsing expression as well.  This is the
485                            not lookahead predicate.
486
487                     [7]    For a parsing expression e the result of  [list  ?
488                            e]  is  a parsing expression as well.  This is the
489                            optional input.
490
491       Canonical serialization
492              The canonical serialization of a parsing expression has the for‐
493              mat  as  specified  in  the previous item, and then additionally
494              satisfies the constraints below, which make it unique among  all
495              the possible serializations of this parsing expression.
496
497              [1]    The  string  representation of the value is the canonical
498                     representation of a pure Tcl list. I.e. it does not  con‐
499                     tain superfluous whitespace.
500
501              [2]    Terminals  are not encoded as ranges (where start and end
502                     of the range are identical).
503
504   EXAMPLE
505       Assuming the parsing expression shown on the  right-hand  side  of  the
506       rule
507
508                  Expression <- Term (AddOp Term)*
509
510
511       then its canonical serialization (except for whitespace) is
512
513                  {x {n Term} {* {x {n AddOp} {n Term}}}}
514
515

BUGS, IDEAS, FEEDBACK

517       This  document,  and the package it describes, will undoubtedly contain
518       bugs and other problems.  Please report such in the category pt of  the
519       Tcllib  Trackers  [http://core.tcl.tk/tcllib/reportlist].   Please also
520       report any ideas for enhancements  you  may  have  for  either  package
521       and/or documentation.
522
523       When proposing code changes, please provide unified diffs, i.e the out‐
524       put of diff -u.
525
526       Note further that  attachments  are  strongly  preferred  over  inlined
527       patches.  Attachments  can  be  made  by  going to the Edit form of the
528       ticket immediately after its creation, and  then  using  the  left-most
529       button in the secondary navigation bar.
530

KEYWORDS

532       CPARAM,  EBNF,  LL(k),  PEG,  TDPL, context-free languages, conversion,
533       expression,  format  conversion,  grammar,  matching,  parser,  parsing
534       expression,  parsing expression grammar, push down automaton, recursive
535       descent, serialization, state, top-down parsing languages, transducer
536

CATEGORY

538       Parsing and Grammars
539
541       Copyright (c) 2009 Andreas Kupries <andreas_kupries@users.sourceforge.net>
542
543
544
545
546tcllib                               1.1.2              pt::peg::to::cparam(n)
Impressum