1pt_export_api(i)                 Parser Tools                 pt_export_api(i)
2
3
4
5______________________________________________________________________________
6

NAME

8       pt_export_api - Parser Tools Export API
9

SYNOPSIS

11       package require Tcl  8.5
12
13       CONVERTER reset
14
15       CONVERTER configure
16
17       CONVERTER configure option
18
19       CONVERTER configure option value...
20
21       CONVERTER convert serial
22
23       ::export serial configuration
24
25______________________________________________________________________________
26

DESCRIPTION

28       Are  you  lost ?  Do you have trouble understanding this document ?  In
29       that case please read the overview  provided  by  the  Introduction  to
30       Parser  Tools.  This document is the entrypoint to the whole system the
31       current package is a part of.
32
33       This document describes two APIs. First the API shared by all  packages
34       for  the conversion of Parsing Expression Grammars into some other for‐
35       mat, and then the API shared by the packages which implement the export
36       plugins sitting on top of the conversion packages.
37
38       Its intended audience are people who wish to create their own converter
39       for some type of output, and/or an export  plugin  for  their  or  some
40       other converter.
41
42       It resides in the Export section of the Core Layer of Parser Tools.
43
44       IMAGE: arch_core_export
45

CONVERTER API

47       Any (grammar) export converter has to follow the rules set out below:
48
49       [1]    A  converter  is a package. Its name is arbitrary, however it is
50              recommended to put it under the ::pt::peg::to namespace.
51
52       [2]    The package provides either a single Tcl command  following  the
53              API  outlined  below,  or a class command whose instances follow
54              the same API. The commands which follow the API are called  con‐
55              verter commands.
56
57       [3]    A  converter  command has to provide the following three methods
58              with the given signatures and semantics. Converter commands  are
59              allowed  to provide more methods of their own, but not less, and
60              they may not provide different semantics  for  the  standardized
61              methods.
62
63              CONVERTER reset
64                     This  method  has  to reset the configuration of the con‐
65                     verter to its default settings. The result of the  method
66                     has to be the empty string.
67
68              CONVERTER configure
69                     This  method,  in  this  form, has to return a dictionary
70                     containing the current configuration of the converter.
71
72              CONVERTER configure option
73                     This method, in this form,  has  to  return  the  current
74                     value  of  the specified configuration option of the con‐
75                     verter.
76
77                     Please read the section Options for the set  of  standard
78                     options  any  converter has to accept.  Any other options
79                     accepted by a specific converter will be described in its
80                     manpage.
81
82              CONVERTER configure option value...
83                     This command, in this form, sets the specified options of
84                     the converter to the given values.
85
86                     Please read the section Options for the set  of  standard
87                     options  a  converter  has  to accept.  Any other options
88                     accepted by a specific converter will be described in its
89                     manpage.
90
91              CONVERTER convert serial
92                     This  method has to accept the canonical serialization of
93                     a parsing expression grammar, as specified in section PEG
94                     serialization  format,  and  contained  in  serial.   The
95                     result of the method has to be the result  of  converting
96                     the input grammar into whatever the converter is for, per
97                     its configuration.
98

PLUGIN API

100       Any (grammar) export plugin has to follow the rules set out below:
101
102       [1]    A plugin is a package.
103
104       [2]    The name of a plugin package has the form  pt::peg::export::FOO,
105              where  FOO  is  the  name of the format the plugin will generate
106              output for.
107
108       [3]    The plugin can expect that the  package  pt::peg::export::plugin
109              is  present,  as  indicator  that  it was invoked from a genuine
110              plugin manager.
111
112              It is recommended that a plugin does check for the  presence  of
113              this package.
114
115       [4]    A  plugin  has to provide a single command, in the global names‐
116              pace, with the signature shown below.  Plugins  are  allowed  to
117              provide  more  command  of their own, but not less, and they may
118              not provide different semantics for the standardized command.
119
120              ::export serial configuration
121                     This command has to accept the canonical serialization of
122                     a  parsing  expression  grammar and the configuration for
123                     the converter invoked by the plugin. The  result  of  the
124                     command  has to be the result of the converter invoked by
125                     the plugin for th input grammar and configuration.
126
127                     string serial
128                            This argument will contain the  canonical  serial‐
129                            ization  of  the  parsing  expression  grammar for
130                            which to generate the output.   The  specification
131                            of  what a canonical serialization is can be found
132                            in the section PEG serialization format.
133
134                     dictionary configuration
135                            This argument will contain  the  configuration  to
136                            configure  the  converter with before invoking it,
137                            as a dictionary mapping from options to values.
138
139                            Please read the section Options  for  the  set  of
140                            standard  options any converter has to accept, and
141                            thus  any  plugin  as  well.   Any  other  options
142                            accepted by a specific plugin will be described in
143                            its manpage.
144
145       [5]    A single usage cycle of a plugin consists of  an  invokation  of
146              the command export. This call has to leave the plugin in a state
147              where another usage cycle can be run without problems.
148

OPTIONS

150       Each export converter and plugin for an export converter has to  accept
151       the  options below in their configure method. Converters are allowed to
152       ignore the contents of these options when performing a conversion,  but
153       they  must  not  reject  them. Plugins are expected to pass the options
154       given to them to the converter they are invoking.
155
156       -file string
157              The value of this option is the name of the file or other entity
158              from  which  the grammar came, for which the command is run. The
159              default value is unknown.
160
161       -name string
162              The value of this option is the name of the grammar we are  pro‐
163              cessing.  The default value is a_pe_grammar.
164
165       -user string
166              The  value  of this option is the name of the user for which the
167              command is run. The default value is unknown.
168

USAGE

170       To use a converter do
171
172
173                  # Get the converter (single command here, not class)
174                  package require the-converter-package
175
176                  # Provide a configuration
177                  theconverter configure ...
178
179                  # Perform the conversion
180                  set result [theconverter convert $thegrammarserial]
181
182                  ... process the result ...
183
184       To use a plugin FOO do
185
186
187                  # Get an export plugin manager
188                  package require pt::peg::export
189                  pt::peg::export E
190
191                  # Provide a configuration
192                  E configuration set ...
193
194                  # Run the plugin, and the converter inside.
195                  set result [E export serial $grammarserial FOO]
196
197                  ... process the result ...
198
199

PEG SERIALIZATION FORMAT

201       Here we specify the format used by the Parser Tools to serialize  Pars‐
202       ing  Expression Grammars as immutable values for transport, comparison,
203       etc.
204
205       We distinguish between regular and canonical serializations.   While  a
206       PEG  may  have  more than one regular serialization only exactly one of
207       them will be canonical.
208
209       regular serialization
210
211              [1]    The serialization of any PEG is a nested Tcl dictionary.
212
213              [2]    This dictionary holds a single key, pt::grammar::peg, and
214                     its value. This value holds the contents of the grammar.
215
216              [3]    The  contents of the grammar are a Tcl dictionary holding
217                     the set of nonterminal symbols and the  starting  expres‐
218                     sion. The relevant keys and their values are
219
220                     rules  The  value  is a Tcl dictionary whose keys are the
221                            names of the  nonterminal  symbols  known  to  the
222                            grammar.
223
224                            [1]    Each  nonterminal  symbol  may  occur  only
225                                   once.
226
227                            [2]    The empty string is not a legal nonterminal
228                                   symbol.
229
230                            [3]    The  value for each symbol is a Tcl dictio‐
231                                   nary itself. The relevant  keys  and  their
232                                   values in this dictionary are
233
234                                   is     The  value  is  the serialization of
235                                          the  parsing  expression  describing
236                                          the symbols sentennial structure, as
237                                          specified in the section PE  serial‐
238                                          ization format.
239
240                                   mode   The value can be one of three values
241                                          specifying how a parser should  han‐
242                                          dle  the  semantic value produced by
243                                          the symbol.
244
245                                          value  The  semantic  value  of  the
246                                                 nonterminal   symbol   is  an
247                                                 abstract syntax tree consist‐
248                                                 ing of a single node node for
249                                                 the nonterminal itself, which
250                                                 has  the ASTs of the symbol's
251                                                 right hand side as its  chil‐
252                                                 dren.
253
254                                          leaf   The  semantic  value  of  the
255                                                 nonterminal  symbol   is   an
256                                                 abstract syntax tree consist‐
257                                                 ing of a single node node for
258                                                 the  nonterminal, without any
259                                                 children. Any ASTs  generated
260                                                 by  the  symbol's  right hand
261                                                 side are discarded.
262
263                                          void   The nonterminal has no seman‐
264                                                 tic value. Any ASTs generated
265                                                 by the  symbol's  right  hand
266                                                 side are discarded (as well).
267
268                     start  The  value is the serialization of the start pars‐
269                            ing expression of the grammar, as specified in the
270                            section PE serialization format.
271
272              [4]    The terminal symbols of the grammar are specified implic‐
273                     itly as the set of all terminal symbols used in the start
274                     expression and on the RHS of the grammar rules.
275
276       canonical serialization
277              The canonical serialization of a grammar has the format as spec‐
278              ified in the previous item, and then additionally satisfies  the
279              constraints  below,  which make it unique among all the possible
280              serializations of this grammar.
281
282              [1]    The keys found in all the  nested  Tcl  dictionaries  are
283                     sorted  in  ascending  dictionary  order, as generated by
284                     Tcl's builtin command lsort -increasing -dict.
285
286              [2]    The string representation of the value is  the  canonical
287                     representation of a Tcl dictionary. I.e. it does not con‐
288                     tain superfluous whitespace.
289
290   EXAMPLE
291       Assuming the following PEG for simple mathematical expressions
292
293              PEG calculator (Expression)
294                  Digit      <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9'       ;
295                  Sign       <- '-' / '+'                                     ;
296                  Number     <- Sign? Digit+                                  ;
297                  Expression <- Term (AddOp Term)*                            ;
298                  MulOp      <- '*' / '/'                                     ;
299                  Term       <- Factor (MulOp Factor)*                        ;
300                  AddOp      <- '+'/'-'                                       ;
301                  Factor     <- '(' Expression ')' / Number                   ;
302              END;
303
304
305       then its canonical serialization (except for whitespace) is
306
307              pt::grammar::peg {
308                  rules {
309                      AddOp      {is {/ {t -} {t +}}                                                                mode value}
310                      Digit      {is {/ {t 0} {t 1} {t 2} {t 3} {t 4} {t 5} {t 6} {t 7} {t 8} {t 9}}                mode value}
311                      Expression {is {x {n Term} {* {x {n AddOp} {n Term}}}}                                        mode value}
312                      Factor     {is {/ {x {t (} {n Expression} {t )}} {n Number}}                                  mode value}
313                      MulOp      {is {/ {t *} {t /}}                                                                mode value}
314                      Number     {is {x {? {n Sign}} {+ {n Digit}}}                                                 mode value}
315                      Sign       {is {/ {t -} {t +}}                                                                mode value}
316                      Term       {is {x {n Factor} {* {x {n MulOp} {n Factor}}}}                                    mode value}
317                  }
318                  start {n Expression}
319              }
320
321

PE SERIALIZATION FORMAT

323       Here we specify the format used by the Parser Tools to serialize  Pars‐
324       ing Expressions as immutable values for transport, comparison, etc.
325
326       We  distinguish  between regular and canonical serializations.  While a
327       parsing expression may have more than one  regular  serialization  only
328       exactly one of them will be canonical.
329
330       Regular serialization
331
332              Atomic Parsing Expressions
333
334                     [1]    The  string  epsilon  is an atomic parsing expres‐
335                            sion. It matches the empty string.
336
337                     [2]    The string dot is an atomic parsing expression. It
338                            matches any character.
339
340                     [3]    The  string alnum is an atomic parsing expression.
341                            It matches any Unicode alphabet or  digit  charac‐
342                            ter.  This  is  a custom extension of PEs based on
343                            Tcl's builtin command string is.
344
345                     [4]    The string alpha is an atomic parsing  expression.
346                            It matches any Unicode alphabet character. This is
347                            a custom extension of PEs based on  Tcl's  builtin
348                            command string is.
349
350                     [5]    The  string ascii is an atomic parsing expression.
351                            It matches any Unicode character below U0080. This
352                            is  a  custom  extension  of  PEs  based  on Tcl's
353                            builtin command string is.
354
355                     [6]    The string control is an  atomic  parsing  expres‐
356                            sion.  It  matches  any Unicode control character.
357                            This is a custom extension of PEs based  on  Tcl's
358                            builtin command string is.
359
360                     [7]    The  string digit is an atomic parsing expression.
361                            It matches any Unicode digit character. Note  that
362                            this  includes  characters  outside  of the [0..9]
363                            range. This is a custom extension of PEs based  on
364                            Tcl's builtin command string is.
365
366                     [8]    The  string graph is an atomic parsing expression.
367                            It matches any Unicode printing character,  except
368                            for space. This is a custom extension of PEs based
369                            on Tcl's builtin command string is.
370
371                     [9]    The string lower is an atomic parsing  expression.
372                            It matches any Unicode lower-case alphabet charac‐
373                            ter. This is a custom extension of  PEs  based  on
374                            Tcl's builtin command string is.
375
376                     [10]   The  string print is an atomic parsing expression.
377                            It matches any Unicode printing character, includ‐
378                            ing space. This is a custom extension of PEs based
379                            on Tcl's builtin command string is.
380
381                     [11]   The string punct is an atomic parsing  expression.
382                            It matches any Unicode punctuation character. This
383                            is a  custom  extension  of  PEs  based  on  Tcl's
384                            builtin command string is.
385
386                     [12]   The  string space is an atomic parsing expression.
387                            It matches any Unicode space character. This is  a
388                            custom  extension  of  PEs  based on Tcl's builtin
389                            command string is.
390
391                     [13]   The string upper is an atomic parsing  expression.
392                            It matches any Unicode upper-case alphabet charac‐
393                            ter. This is a custom extension of  PEs  based  on
394                            Tcl's builtin command string is.
395
396                     [14]   The  string  wordchar is an atomic parsing expres‐
397                            sion. It matches any Unicode word character.  This
398                            is any alphanumeric character (see alnum), and any
399                            connector  punctuation  characters  (e.g.   under‐
400                            score). This is a custom extension of PEs based on
401                            Tcl's builtin command string is.
402
403                     [15]   The string xdigit is an atomic parsing expression.
404                            It  matches  any hexadecimal digit character. This
405                            is a  custom  extension  of  PEs  based  on  Tcl's
406                            builtin command string is.
407
408                     [16]   The string ddigit is an atomic parsing expression.
409                            It matches any decimal digit character. This is  a
410                            custom  extension  of  PEs  based on Tcl's builtin
411                            command regexp.
412
413                     [17]   The expression [list t x]  is  an  atomic  parsing
414                            expression. It matches the terminal string x.
415
416                     [18]   The  expression  [list  n  A] is an atomic parsing
417                            expression. It matches the nonterminal A.
418
419              Combined Parsing Expressions
420
421                     [1]    For parsing expressions e1, e2, ... the result  of
422                            [list  /  e1  e2  ... ] is a parsing expression as
423                            well.  This is the ordered choice, aka prioritized
424                            choice.
425
426                     [2]    For  parsing expressions e1, e2, ... the result of
427                            [list x e1 e2 ... ] is  a  parsing  expression  as
428                            well.  This is the sequence.
429
430                     [3]    For  a  parsing expression e the result of [list *
431                            e] is a parsing expression as well.  This  is  the
432                            kleene  closure,  describing  zero or more repeti‐
433                            tions.
434
435                     [4]    For a parsing expression e the result of  [list  +
436                            e]  is  a parsing expression as well.  This is the
437                            positive kleene closure, describing  one  or  more
438                            repetitions.
439
440                     [5]    For  a  parsing expression e the result of [list &
441                            e] is a parsing expression as well.  This  is  the
442                            and lookahead predicate.
443
444                     [6]    For  a  parsing expression e the result of [list !
445                            e] is a parsing expression as well.  This  is  the
446                            not lookahead predicate.
447
448                     [7]    For  a  parsing expression e the result of [list ?
449                            e] is a parsing expression as well.  This  is  the
450                            optional input.
451
452       Canonical serialization
453              The canonical serialization of a parsing expression has the for‐
454              mat as specified in the previous  item,  and  then  additionally
455              satisfies  the constraints below, which make it unique among all
456              the possible serializations of this parsing expression.
457
458              [1]    The string representation of the value is  the  canonical
459                     representation  of a pure Tcl list. I.e. it does not con‐
460                     tain superfluous whitespace.
461
462              [2]    Terminals are not encoded as ranges (where start and  end
463                     of the range are identical).
464
465   EXAMPLE
466       Assuming  the  parsing  expression  shown on the right-hand side of the
467       rule
468
469                  Expression <- Term (AddOp Term)*
470
471
472       then its canonical serialization (except for whitespace) is
473
474                  {x {n Term} {* {x {n AddOp} {n Term}}}}
475
476

BUGS, IDEAS, FEEDBACK

478       This document, and the package it describes, will  undoubtedly  contain
479       bugs  and other problems.  Please report such in the category pt of the
480       Tcllib Trackers  [http://core.tcl.tk/tcllib/reportlist].   Please  also
481       report  any  ideas  for  enhancements  you  may have for either package
482       and/or documentation.
483
484       When proposing code changes, please provide unified diffs, i.e the out‐
485       put of diff -u.
486
487       Note  further  that  attachments  are  strongly  preferred over inlined
488       patches. Attachments can be made by going  to  the  Edit  form  of  the
489       ticket  immediately  after  its  creation, and then using the left-most
490       button in the secondary navigation bar.
491

KEYWORDS

493       EBNF, LL(k), PEG, TDPL, context-free  languages,  expression,  grammar,
494       matching,  parser, parsing expression, parsing expression grammar, push
495       down automaton, recursive descent, state, top-down  parsing  languages,
496       transducer
497

CATEGORY

499       Parsing and Grammars
500
502       Copyright (c) 2009 Andreas Kupries <andreas_kupries@users.sourceforge.net>
503
504
505
506
507tcllib                                 1                      pt_export_api(i)
Impressum