1pt_import_api(i)                 Parser Tools                 pt_import_api(i)
2
3
4
5______________________________________________________________________________
6

NAME

8       pt_import_api - Parser Tools Import API
9

SYNOPSIS

11       package require Tcl  8.5
12
13       CONVERTER convert text
14
15       IncludeFile currentfile path
16
17       ::import text
18
19______________________________________________________________________________
20

DESCRIPTION

22       Are  you  lost ?  Do you have trouble understanding this document ?  In
23       that case please read the overview  provided  by  the  Introduction  to
24       Parser  Tools.  This document is the entrypoint to the whole system the
25       current package is a part of.
26
27       This document describes two APIs. First the API shared by all  packages
28       for  the  conversion of some other format into Parsing Expression Gram‐
29       mars , and then the API shared by  the  packages  which  implement  the
30       import plugins sitting on top of the conversion packages.
31
32       Its intended audience are people who wish to create their own converter
33       for some type of input, and/or an import plugin for their or some other
34       converter.
35
36       It resides in the Import section of the Core Layer of Parser Tools.
37
38       IMAGE: arch_core_import
39

CONVERTER API

41       Any (grammar) import converter has to follow the rules set out below:
42
43       [1]    A  converter  is a package. Its name is arbitrary, however it is
44              recommended to put it under the ::pt::peg::from namespace.
45
46       [2]    The package provides either a single Tcl command  following  the
47              API  outlined  below,  or a class command whose instances follow
48              the same API. The commands which follow the API are called  con‐
49              verter commands.
50
51       [3]    A  converter  command has to provide the following single method
52              with the given signature and semantic.  Converter  commands  are
53              allowed  to provide more methods of their own, but not less, and
54              they may not provide different semantics  for  the  standardized
55              method.
56
57              CONVERTER convert text
58                     This method has to accept some text, a parsing expression
59                     grammar in some format.  The result of the method has  to
60                     be  the  canonical  serialization of a parsing expression
61                     grammar, as specified in section PEG  serialization  for‐
62                     mat, the result of reading and converting the input text.
63

PLUGIN API

65       Any (grammar) import plugin has to follow the rules set out below:
66
67       [1]    A plugin is a package.
68
69       [2]    The  name of a plugin package has the form pt::peg::import::FOO,
70              where FOO is the name of the format the plugin will accept input
71              for.
72
73       [3]    The  plugin  can expect that the package pt::peg::import::plugin
74              is present, as indicator that it  was  invoked  from  a  genuine
75              plugin manager.
76
77              It  is  recommended that a plugin does check for the presence of
78              this package.
79
80       [4]    The plugin can  expect  that  a  command  named  IncludeFile  is
81              present, with the signature
82
83              IncludeFile currentfile path
84                     This  command has to be invoked by the plugin when it has
85                     to process an included file, if the format has  the  con‐
86                     cept of such.
87
88                     The plugin has to supply the following arguments
89
90                     string currentfile
91                            The  path  of the file it is currently processing.
92                            This may be the empty string if no such is known.
93
94                     string path
95                            The path of the include file as specified  in  the
96                            include directive being processed.
97
98                     The  result  of the command will be a 5-element list con‐
99                     taining
100
101                     [1]    A boolean flag indicating the  success  (True)  or
102                            failure (False) of the operation.
103
104                     [2]    In  case  of  success the contents of the included
105                            file, and the empty string otherwise.
106
107                     [3]    The resolved, i.e. absolute path of  the  included
108                            file, if possible, or the unchanged path argument.
109                            This is for display in an error message, or as the
110                            currentfile  argument  of another call to Include‐
111                            File should this file contain more files.
112
113                     [4]    In case of success an empty string, and for  fail‐
114                            ure a code indicating the reason for it, one of
115
116                            notfound
117                                   The specified file could not be found.
118
119                            notread
120                                   The  specified  file  was found, but not be
121                                   read into memory.
122
123                     [5]    An empty string in case of success of  a  notfound
124                            failure,  and an additional error message describ‐
125                            ing the reason for a notread error in more detail.
126
127       [5]    A plugin has to provide a single command, in the  global  names‐
128              pace,  with  the  signature  shown below. Plugins are allowed to
129              provide more commands of their own, but not less, and  they  may
130              not provide different semantics for the standardized command.
131
132              ::import text
133                     This  command has to accept the a text containing a pars‐
134                     ing expression grammar in some format. The result of  the
135                     command  has to be the result of the converter invoked by
136                     the plugin for the input grammar, the  canonical  serial‐
137                     ization  of  the  parsing expression grammar contained in
138                     the input.
139
140                     string text
141                            This argument will contain the parsing  expression
142                            grammar  for  which to generate the serialization.
143                            The specification of what a  canonical  serializa‐
144                            tion is can be found in the section PEG serializa‐
145                            tion format.
146
147       [6]    A single usage cycle of a plugin consists of  an  invokation  of
148              the command import. This call has to leave the plugin in a state
149              where another usage cycle can be run without problems.
150

USAGE

152       To use a converter do
153
154
155                  # Get the converter (single command here, not class)
156                  package require the-converter-package
157
158                  # Perform the conversion
159                  set serial [theconverter convert $thegrammartext]
160
161                  ... process the result ...
162
163       To use a plugin FOO do
164
165
166                  # Get an import plugin manager
167                  package require pt::peg::import
168                  pt::peg::import I
169
170                  # Run the plugin, and the converter inside.
171                  set serial [I import serial $thegrammartext FOO]
172
173                  ... process the result ...
174
175

PEG SERIALIZATION FORMAT

177       Here we specify the format used by the Parser Tools to serialize  Pars‐
178       ing  Expression Grammars as immutable values for transport, comparison,
179       etc.
180
181       We distinguish between regular and canonical serializations.   While  a
182       PEG  may  have  more than one regular serialization only exactly one of
183       them will be canonical.
184
185       regular serialization
186
187              [1]    The serialization of any PEG is a nested Tcl dictionary.
188
189              [2]    This dictionary holds a single key, pt::grammar::peg, and
190                     its value. This value holds the contents of the grammar.
191
192              [3]    The  contents of the grammar are a Tcl dictionary holding
193                     the set of nonterminal symbols and the  starting  expres‐
194                     sion. The relevant keys and their values are
195
196                     rules  The  value  is a Tcl dictionary whose keys are the
197                            names of the  nonterminal  symbols  known  to  the
198                            grammar.
199
200                            [1]    Each  nonterminal  symbol  may  occur  only
201                                   once.
202
203                            [2]    The empty string is not a legal nonterminal
204                                   symbol.
205
206                            [3]    The  value for each symbol is a Tcl dictio‐
207                                   nary itself. The relevant  keys  and  their
208                                   values in this dictionary are
209
210                                   is     The  value  is  the serialization of
211                                          the  parsing  expression  describing
212                                          the symbols sentennial structure, as
213                                          specified in the section PE  serial‐
214                                          ization format.
215
216                                   mode   The value can be one of three values
217                                          specifying how a parser should  han‐
218                                          dle  the  semantic value produced by
219                                          the symbol.
220
221                                          value  The  semantic  value  of  the
222                                                 nonterminal   symbol   is  an
223                                                 abstract syntax tree consist‐
224                                                 ing of a single node node for
225                                                 the nonterminal itself, which
226                                                 has  the ASTs of the symbol's
227                                                 right hand side as its  chil‐
228                                                 dren.
229
230                                          leaf   The  semantic  value  of  the
231                                                 nonterminal  symbol   is   an
232                                                 abstract syntax tree consist‐
233                                                 ing of a single node node for
234                                                 the  nonterminal, without any
235                                                 children. Any ASTs  generated
236                                                 by  the  symbol's  right hand
237                                                 side are discarded.
238
239                                          void   The nonterminal has no seman‐
240                                                 tic value. Any ASTs generated
241                                                 by the  symbol's  right  hand
242                                                 side are discarded (as well).
243
244                     start  The  value is the serialization of the start pars‐
245                            ing expression of the grammar, as specified in the
246                            section PE serialization format.
247
248              [4]    The terminal symbols of the grammar are specified implic‐
249                     itly as the set of all terminal symbols used in the start
250                     expression and on the RHS of the grammar rules.
251
252       canonical serialization
253              The canonical serialization of a grammar has the format as spec‐
254              ified in the previous item, and then additionally satisfies  the
255              constraints  below,  which make it unique among all the possible
256              serializations of this grammar.
257
258              [1]    The keys found in all the  nested  Tcl  dictionaries  are
259                     sorted  in  ascending  dictionary  order, as generated by
260                     Tcl's builtin command lsort -increasing -dict.
261
262              [2]    The string representation of the value is  the  canonical
263                     representation of a Tcl dictionary. I.e. it does not con‐
264                     tain superfluous whitespace.
265
266   EXAMPLE
267       Assuming the following PEG for simple mathematical expressions
268
269              PEG calculator (Expression)
270                  Digit      <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9'       ;
271                  Sign       <- '-' / '+'                                     ;
272                  Number     <- Sign? Digit+                                  ;
273                  Expression <- Term (AddOp Term)*                            ;
274                  MulOp      <- '*' / '/'                                     ;
275                  Term       <- Factor (MulOp Factor)*                        ;
276                  AddOp      <- '+'/'-'                                       ;
277                  Factor     <- '(' Expression ')' / Number                   ;
278              END;
279
280
281       then its canonical serialization (except for whitespace) is
282
283              pt::grammar::peg {
284                  rules {
285                      AddOp      {is {/ {t -} {t +}}                                                                mode value}
286                      Digit      {is {/ {t 0} {t 1} {t 2} {t 3} {t 4} {t 5} {t 6} {t 7} {t 8} {t 9}}                mode value}
287                      Expression {is {x {n Term} {* {x {n AddOp} {n Term}}}}                                        mode value}
288                      Factor     {is {/ {x {t (} {n Expression} {t )}} {n Number}}                                  mode value}
289                      MulOp      {is {/ {t *} {t /}}                                                                mode value}
290                      Number     {is {x {? {n Sign}} {+ {n Digit}}}                                                 mode value}
291                      Sign       {is {/ {t -} {t +}}                                                                mode value}
292                      Term       {is {x {n Factor} {* {x {n MulOp} {n Factor}}}}                                    mode value}
293                  }
294                  start {n Expression}
295              }
296
297

PE SERIALIZATION FORMAT

299       Here we specify the format used by the Parser Tools to serialize  Pars‐
300       ing Expressions as immutable values for transport, comparison, etc.
301
302       We  distinguish  between regular and canonical serializations.  While a
303       parsing expression may have more than one  regular  serialization  only
304       exactly one of them will be canonical.
305
306       Regular serialization
307
308              Atomic Parsing Expressions
309
310                     [1]    The  string  epsilon  is an atomic parsing expres‐
311                            sion. It matches the empty string.
312
313                     [2]    The string dot is an atomic parsing expression. It
314                            matches any character.
315
316                     [3]    The  string alnum is an atomic parsing expression.
317                            It matches any Unicode alphabet or  digit  charac‐
318                            ter.  This  is  a custom extension of PEs based on
319                            Tcl's builtin command string is.
320
321                     [4]    The string alpha is an atomic parsing  expression.
322                            It matches any Unicode alphabet character. This is
323                            a custom extension of PEs based on  Tcl's  builtin
324                            command string is.
325
326                     [5]    The  string ascii is an atomic parsing expression.
327                            It matches any Unicode character below U0080. This
328                            is  a  custom  extension  of  PEs  based  on Tcl's
329                            builtin command string is.
330
331                     [6]    The string control is an  atomic  parsing  expres‐
332                            sion.  It  matches  any Unicode control character.
333                            This is a custom extension of PEs based  on  Tcl's
334                            builtin command string is.
335
336                     [7]    The  string digit is an atomic parsing expression.
337                            It matches any Unicode digit character. Note  that
338                            this  includes  characters  outside  of the [0..9]
339                            range. This is a custom extension of PEs based  on
340                            Tcl's builtin command string is.
341
342                     [8]    The  string graph is an atomic parsing expression.
343                            It matches any Unicode printing character,  except
344                            for space. This is a custom extension of PEs based
345                            on Tcl's builtin command string is.
346
347                     [9]    The string lower is an atomic parsing  expression.
348                            It matches any Unicode lower-case alphabet charac‐
349                            ter. This is a custom extension of  PEs  based  on
350                            Tcl's builtin command string is.
351
352                     [10]   The  string print is an atomic parsing expression.
353                            It matches any Unicode printing character, includ‐
354                            ing space. This is a custom extension of PEs based
355                            on Tcl's builtin command string is.
356
357                     [11]   The string punct is an atomic parsing  expression.
358                            It matches any Unicode punctuation character. This
359                            is a  custom  extension  of  PEs  based  on  Tcl's
360                            builtin command string is.
361
362                     [12]   The  string space is an atomic parsing expression.
363                            It matches any Unicode space character. This is  a
364                            custom  extension  of  PEs  based on Tcl's builtin
365                            command string is.
366
367                     [13]   The string upper is an atomic parsing  expression.
368                            It matches any Unicode upper-case alphabet charac‐
369                            ter. This is a custom extension of  PEs  based  on
370                            Tcl's builtin command string is.
371
372                     [14]   The  string  wordchar is an atomic parsing expres‐
373                            sion. It matches any Unicode word character.  This
374                            is any alphanumeric character (see alnum), and any
375                            connector  punctuation  characters  (e.g.   under‐
376                            score). This is a custom extension of PEs based on
377                            Tcl's builtin command string is.
378
379                     [15]   The string xdigit is an atomic parsing expression.
380                            It  matches  any hexadecimal digit character. This
381                            is a  custom  extension  of  PEs  based  on  Tcl's
382                            builtin command string is.
383
384                     [16]   The string ddigit is an atomic parsing expression.
385                            It matches any decimal digit character. This is  a
386                            custom  extension  of  PEs  based on Tcl's builtin
387                            command regexp.
388
389                     [17]   The expression [list t x]  is  an  atomic  parsing
390                            expression. It matches the terminal string x.
391
392                     [18]   The  expression  [list  n  A] is an atomic parsing
393                            expression. It matches the nonterminal A.
394
395              Combined Parsing Expressions
396
397                     [1]    For parsing expressions e1, e2, ... the result  of
398                            [list  /  e1  e2  ... ] is a parsing expression as
399                            well.  This is the ordered choice, aka prioritized
400                            choice.
401
402                     [2]    For  parsing expressions e1, e2, ... the result of
403                            [list x e1 e2 ... ] is  a  parsing  expression  as
404                            well.  This is the sequence.
405
406                     [3]    For  a  parsing expression e the result of [list *
407                            e] is a parsing expression as well.  This  is  the
408                            kleene  closure,  describing  zero or more repeti‐
409                            tions.
410
411                     [4]    For a parsing expression e the result of  [list  +
412                            e]  is  a parsing expression as well.  This is the
413                            positive kleene closure, describing  one  or  more
414                            repetitions.
415
416                     [5]    For  a  parsing expression e the result of [list &
417                            e] is a parsing expression as well.  This  is  the
418                            and lookahead predicate.
419
420                     [6]    For  a  parsing expression e the result of [list !
421                            e] is a parsing expression as well.  This  is  the
422                            not lookahead predicate.
423
424                     [7]    For  a  parsing expression e the result of [list ?
425                            e] is a parsing expression as well.  This  is  the
426                            optional input.
427
428       Canonical serialization
429              The canonical serialization of a parsing expression has the for‐
430              mat as specified in the previous  item,  and  then  additionally
431              satisfies  the constraints below, which make it unique among all
432              the possible serializations of this parsing expression.
433
434              [1]    The string representation of the value is  the  canonical
435                     representation  of a pure Tcl list. I.e. it does not con‐
436                     tain superfluous whitespace.
437
438              [2]    Terminals are not encoded as ranges (where start and  end
439                     of the range are identical).
440
441   EXAMPLE
442       Assuming  the  parsing  expression  shown on the right-hand side of the
443       rule
444
445                  Expression <- Term (AddOp Term)*
446
447
448       then its canonical serialization (except for whitespace) is
449
450                  {x {n Term} {* {x {n AddOp} {n Term}}}}
451
452

BUGS, IDEAS, FEEDBACK

454       This document, and the package it describes, will  undoubtedly  contain
455       bugs  and other problems.  Please report such in the category pt of the
456       Tcllib Trackers  [http://core.tcl.tk/tcllib/reportlist].   Please  also
457       report  any  ideas  for  enhancements  you  may have for either package
458       and/or documentation.
459
460       When proposing code changes, please provide unified diffs, i.e the out‐
461       put of diff -u.
462
463       Note  further  that  attachments  are  strongly  preferred over inlined
464       patches. Attachments can be made by going  to  the  Edit  form  of  the
465       ticket  immediately  after  its  creation, and then using the left-most
466       button in the secondary navigation bar.
467

KEYWORDS

469       EBNF, LL(k), PEG, TDPL, context-free  languages,  expression,  grammar,
470       matching,  parser, parsing expression, parsing expression grammar, push
471       down automaton, recursive descent, state, top-down  parsing  languages,
472       transducer
473

CATEGORY

475       Parsing and Grammars
476
478       Copyright (c) 2009 Andreas Kupries <andreas_kupries@users.sourceforge.net>
479
480
481
482
483tcllib                                 1                      pt_import_api(i)
Impressum