1pt::peg::import(n)               Parser Tools               pt::peg::import(n)
2
3
4
5______________________________________________________________________________
6

NAME

8       pt::peg::import - PEG Import
9

SYNOPSIS

11       package require Tcl  8.5
12
13       package require Tcl  8.5
14
15       package require snit
16
17       package require fileutil::paths
18
19       package require pt::peg
20
21       package require pluginmgr
22
23       package require pt::peg::import  ?1.0.1?
24
25       ::pt::peg::import objectName
26
27       objectName method ?arg arg ...?
28
29       objectName destroy
30
31       objectName import text text ?format?
32
33       objectName import file path ?format?
34
35       objectName import object text object text ?format?
36
37       objectName import object file object path ?format?
38
39       objectName includes
40
41       objectName include add path
42
43       objectName include remove path
44
45       objectName include clear
46
47______________________________________________________________________________
48

DESCRIPTION

50       Are  you  lost ?  Do you have trouble understanding this document ?  In
51       that case please read the overview  provided  by  the  Introduction  to
52       Parser  Tools.  This document is the entrypoint to the whole system the
53       current package is a part of.
54
55       This package provides a manager for parsing expression  grammars,  with
56       each  instance  handling  a  set of plugins for the import of them from
57       other formats, i.e. their conversion from, for example peg,  container,
58       json, etc.
59
60       It resides in the Import section of the Core Layer of Parser Tools, and
61       is one of the three pillars the management of parsing expression  gram‐
62       mars resides on.
63
64       IMAGE: arch_core_import
65
66       The other two pillars are, as shown above
67
68       [1]    PEG Export, and
69
70       [2]    PEG Storage
71
72       For  information  about the data structure which is the major output of
73       the manager objects provided by this package see the section PEG  seri‐
74       alization format.
75
76       The  plugin  system of our class is based on the package pluginmgr, and
77       configured to look for plugins using
78
79       [1]    the environment variable GRAMMAR_PEG_IMPORT_PLUGINS,
80
81       [2]    the environment variable GRAMMAR_PEG_PLUGINS,
82
83       [3]    the environment variable GRAMMAR_PLUGINS,
84
85       [4]    the path "~/.grammar/peg/import/plugin"
86
87       [5]    the path "~/.grammar/peg/plugin"
88
89       [6]    the path "~/.grammar/plugin"
90
91       [7]    the path "~/.grammar/peg/import/plugins"
92
93       [8]    the path "~/.grammar/peg/plugins"
94
95       [9]    the path "~/.grammar/plugins"
96
97       [10]   the registry  entry  "HKEY_CURRENT_USER\SOFTWARE\GRAMMAR\PEG\IM‐
98              PORT\PLUGINS"
99
100       [11]   the registry entry "HKEY_CURRENT_USER\SOFTWARE\GRAMMAR\PEG\PLUG‐
101              INS"
102
103       [12]   the registry entry "HKEY_CURRENT_USER\SOFTWARE\GRAMMAR\PLUGINS"
104
105       The last three are used only when the package is run on a machine using
106       the Windows(tm) operating system.
107
108       The  whole  system  is  delivered with three predefined import plugins,
109       namely
110
111       container
112              See PEG Import Plugin. From CONTAINER format for details.
113
114       json   See PEG Import Plugin. From JSON format for details.
115
116       peg    See PEG Import Plugin. From PEG format for details.
117
118       For readers wishing to write their own import plugin for  some  format,
119       i.e. plugin writers, reading and understanding the Parser Tools Impport
120       API specification is an absolute necessity, as it documents the  inter‐
121       action between this package and its plugins in detail.
122

API

124   PACKAGE COMMANDS
125       ::pt::peg::import objectName
126              This command creates a new import manager object with an associ‐
127              ated Tcl command whose name is objectName. This  object  command
128              is  explained  in full detail in the sections Object command and
129              Object methods. The object command will  be  created  under  the
130              current  namespace if the objectName is not fully qualified, and
131              in the specified namespace otherwise.
132
133   OBJECT COMMAND
134       All objects created by the ::pt::peg::import command have the following
135       general form:
136
137       objectName method ?arg arg ...?
138              The  method method and its arg'uments determine the exact behav‐
139              ior of the command.  See section Object methods for the detailed
140              specifications.
141
142   OBJECT METHODS
143       objectName destroy
144              This method destroys the object it is invoked for.
145
146       objectName import text text ?format?
147              This  method  takes  the text and converts it from the specified
148              format to the canonical serialization of  a  parsing  expression
149              grammar  using  the  import  plugin  for the format. An error is
150              thrown if no plugin could be found for the format.  The  serial‐
151              ization  generated  by the conversion process is returned as the
152              result of this method.
153
154              If no format is specified the method defaults to text.
155
156              The specification of what a canonical serialization  is  can  be
157              found in the section PEG serialization format.
158
159              The  plugin  has  to  conform to the interface documented in the
160              Parser Tools Import API specification.
161
162       objectName import file path ?format?
163              This method is a  convenient  wrapper  around  the  import  text
164              method described by the previous item.  It reads the contents of
165              the specified file into memory, feeds  the  result  into  import
166              text and returns the resulting serialization as its own result.
167
168       objectName import object text object text ?format?
169              This  method  is  a  convenient  wrapper  around the import text
170              method described by the previous item.  It expects  that  object
171              is  an  object command supporting a deserialize method expecting
172              the canonical serialization of a parsing expression grammar.  It
173              imports  the text using import text and then feeds the resulting
174              serialization into the object via deserialize.  This method  re‐
175              turns the empty string as it result.
176
177       objectName import object file object path ?format?
178              This  method  behaves  like  import  object text, except that it
179              reads the text to convert from the specified file instead of be‐
180              ing given it as argument.
181
182       objectName includes
183              This  method  returns  a list containing the currently specified
184              paths to use to search for include files when processing  input.
185              The order of paths in the list corresponds to the order in which
186              they are used, from first to last, and also corresponds  to  the
187              order in which they were added to the object.
188
189       objectName include add path
190              This methods adds the specified path to the list of paths to use
191              to search for include files when processing input. The  path  is
192              added  to  the  end of the list, causing it to be searched after
193              all previously added paths. The result of  the  command  is  the
194              empty string.
195
196              The method does nothing if the path is already known.
197
198       objectName include remove path
199              This  methods  removes the specified path from the list of paths
200              to use to search for include files when  processing  input.  The
201              result of the command is the empty string.
202
203              The method does nothing if the path is not known.
204
205       objectName include clear
206              This  method  clears  the list of paths to use to search for in‐
207              clude files when processing input. The result of the command  is
208              the empty string.
209

PEG SERIALIZATION FORMAT

211       Here  we specify the format used by the Parser Tools to serialize Pars‐
212       ing Expression Grammars as immutable values for transport,  comparison,
213       etc.
214
215       We  distinguish  between regular and canonical serializations.  While a
216       PEG may have more than one regular serialization only  exactly  one  of
217       them will be canonical.
218
219       regular serialization
220
221              [1]    The serialization of any PEG is a nested Tcl dictionary.
222
223              [2]    This dictionary holds a single key, pt::grammar::peg, and
224                     its value. This value holds the contents of the grammar.
225
226              [3]    The contents of the grammar are a Tcl dictionary  holding
227                     the  set  of nonterminal symbols and the starting expres‐
228                     sion. The relevant keys and their values are
229
230                     rules  The value is a Tcl dictionary whose keys  are  the
231                            names  of  the  nonterminal  symbols  known to the
232                            grammar.
233
234                            [1]    Each  nonterminal  symbol  may  occur  only
235                                   once.
236
237                            [2]    The empty string is not a legal nonterminal
238                                   symbol.
239
240                            [3]    The value for each symbol is a Tcl  dictio‐
241                                   nary  itself.  The  relevant keys and their
242                                   values in this dictionary are
243
244                                   is     The value is  the  serialization  of
245                                          the  parsing  expression  describing
246                                          the symbols sentennial structure, as
247                                          specified  in the section PE serial‐
248                                          ization format.
249
250                                   mode   The value can be one of three values
251                                          specifying  how a parser should han‐
252                                          dle the semantic value  produced  by
253                                          the symbol.
254
255                                          value  The  semantic  value  of  the
256                                                 nonterminal symbol is an  ab‐
257                                                 stract syntax tree consisting
258                                                 of a single node node for the
259                                                 nonterminal itself, which has
260                                                 the  ASTs  of  the   symbol's
261                                                 right  hand side as its chil‐
262                                                 dren.
263
264                                          leaf   The  semantic  value  of  the
265                                                 nonterminal  symbol is an ab‐
266                                                 stract syntax tree consisting
267                                                 of a single node node for the
268                                                 nonterminal,   without    any
269                                                 children.  Any ASTs generated
270                                                 by the  symbol's  right  hand
271                                                 side are discarded.
272
273                                          void   The nonterminal has no seman‐
274                                                 tic value. Any ASTs generated
275                                                 by  the  symbol's  right hand
276                                                 side are discarded (as well).
277
278                     start  The value is the serialization of the start  pars‐
279                            ing expression of the grammar, as specified in the
280                            section PE serialization format.
281
282              [4]    The terminal symbols of the grammar are specified implic‐
283                     itly as the set of all terminal symbols used in the start
284                     expression and on the RHS of the grammar rules.
285
286       canonical serialization
287              The canonical serialization of a grammar has the format as spec‐
288              ified  in the previous item, and then additionally satisfies the
289              constraints below, which make it unique among all  the  possible
290              serializations of this grammar.
291
292              [1]    The  keys  found  in  all the nested Tcl dictionaries are
293                     sorted in ascending dictionary  order,  as  generated  by
294                     Tcl's builtin command lsort -increasing -dict.
295
296              [2]    The  string  representation of the value is the canonical
297                     representation of a Tcl dictionary. I.e. it does not con‐
298                     tain superfluous whitespace.
299
300   EXAMPLE
301       Assuming the following PEG for simple mathematical expressions
302
303              PEG calculator (Expression)
304                  Digit      <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9'       ;
305                  Sign       <- '-' / '+'                                     ;
306                  Number     <- Sign? Digit+                                  ;
307                  Expression <- Term (AddOp Term)*                            ;
308                  MulOp      <- '*' / '/'                                     ;
309                  Term       <- Factor (MulOp Factor)*                        ;
310                  AddOp      <- '+'/'-'                                       ;
311                  Factor     <- '(' Expression ')' / Number                   ;
312              END;
313
314
315       then its canonical serialization (except for whitespace) is
316
317              pt::grammar::peg {
318                  rules {
319                      AddOp      {is {/ {t -} {t +}}                                                                mode value}
320                      Digit      {is {/ {t 0} {t 1} {t 2} {t 3} {t 4} {t 5} {t 6} {t 7} {t 8} {t 9}}                mode value}
321                      Expression {is {x {n Term} {* {x {n AddOp} {n Term}}}}                                        mode value}
322                      Factor     {is {/ {x {t (} {n Expression} {t )}} {n Number}}                                  mode value}
323                      MulOp      {is {/ {t *} {t /}}                                                                mode value}
324                      Number     {is {x {? {n Sign}} {+ {n Digit}}}                                                 mode value}
325                      Sign       {is {/ {t -} {t +}}                                                                mode value}
326                      Term       {is {x {n Factor} {* {x {n MulOp} {n Factor}}}}                                    mode value}
327                  }
328                  start {n Expression}
329              }
330
331

PE SERIALIZATION FORMAT

333       Here  we specify the format used by the Parser Tools to serialize Pars‐
334       ing Expressions as immutable values for transport, comparison, etc.
335
336       We distinguish between regular and canonical serializations.   While  a
337       parsing  expression  may  have more than one regular serialization only
338       exactly one of them will be canonical.
339
340       Regular serialization
341
342              Atomic Parsing Expressions
343
344                     [1]    The string epsilon is an  atomic  parsing  expres‐
345                            sion. It matches the empty string.
346
347                     [2]    The string dot is an atomic parsing expression. It
348                            matches any character.
349
350                     [3]    The string alnum is an atomic parsing  expression.
351                            It  matches  any Unicode alphabet or digit charac‐
352                            ter. This is a custom extension of  PEs  based  on
353                            Tcl's builtin command string is.
354
355                     [4]    The  string alpha is an atomic parsing expression.
356                            It matches any Unicode alphabet character. This is
357                            a  custom  extension of PEs based on Tcl's builtin
358                            command string is.
359
360                     [5]    The string ascii is an atomic parsing  expression.
361                            It matches any Unicode character below U0080. This
362                            is a  custom  extension  of  PEs  based  on  Tcl's
363                            builtin command string is.
364
365                     [6]    The  string  control  is an atomic parsing expres‐
366                            sion. It matches any  Unicode  control  character.
367                            This  is  a custom extension of PEs based on Tcl's
368                            builtin command string is.
369
370                     [7]    The string digit is an atomic parsing  expression.
371                            It  matches any Unicode digit character. Note that
372                            this includes characters  outside  of  the  [0..9]
373                            range.  This is a custom extension of PEs based on
374                            Tcl's builtin command string is.
375
376                     [8]    The string graph is an atomic parsing  expression.
377                            It  matches any Unicode printing character, except
378                            for space. This is a custom extension of PEs based
379                            on Tcl's builtin command string is.
380
381                     [9]    The  string lower is an atomic parsing expression.
382                            It matches any Unicode lower-case alphabet charac‐
383                            ter.  This  is  a custom extension of PEs based on
384                            Tcl's builtin command string is.
385
386                     [10]   The string print is an atomic parsing  expression.
387                            It matches any Unicode printing character, includ‐
388                            ing space. This is a custom extension of PEs based
389                            on Tcl's builtin command string is.
390
391                     [11]   The  string punct is an atomic parsing expression.
392                            It matches any Unicode punctuation character. This
393                            is  a  custom  extension  of  PEs  based  on Tcl's
394                            builtin command string is.
395
396                     [12]   The string space is an atomic parsing  expression.
397                            It  matches any Unicode space character. This is a
398                            custom extension of PEs  based  on  Tcl's  builtin
399                            command string is.
400
401                     [13]   The  string upper is an atomic parsing expression.
402                            It matches any Unicode upper-case alphabet charac‐
403                            ter.  This  is  a custom extension of PEs based on
404                            Tcl's builtin command string is.
405
406                     [14]   The string wordchar is an atomic  parsing  expres‐
407                            sion.  It matches any Unicode word character. This
408                            is any alphanumeric character (see alnum), and any
409                            connector  punctuation  characters  (e.g.   under‐
410                            score). This is a custom extension of PEs based on
411                            Tcl's builtin command string is.
412
413                     [15]   The string xdigit is an atomic parsing expression.
414                            It matches any hexadecimal digit  character.  This
415                            is  a  custom  extension  of  PEs  based  on Tcl's
416                            builtin command string is.
417
418                     [16]   The string ddigit is an atomic parsing expression.
419                            It  matches any decimal digit character. This is a
420                            custom extension of PEs  based  on  Tcl's  builtin
421                            command regexp.
422
423                     [17]   The expression [list t x] is an atomic parsing ex‐
424                            pression. It matches the terminal string x.
425
426                     [18]   The expression [list n A] is an atomic parsing ex‐
427                            pression. It matches the nonterminal A.
428
429              Combined Parsing Expressions
430
431                     [1]    For  parsing expressions e1, e2, ... the result of
432                            [list / e1 e2 ... ] is  a  parsing  expression  as
433                            well.  This is the ordered choice, aka prioritized
434                            choice.
435
436                     [2]    For parsing expressions e1, e2, ... the result  of
437                            [list  x  e1  e2  ... ] is a parsing expression as
438                            well.  This is the sequence.
439
440                     [3]    For a parsing expression e the result of  [list  *
441                            e]  is  a parsing expression as well.  This is the
442                            kleene closure, describing zero  or  more  repeti‐
443                            tions.
444
445                     [4]    For  a  parsing expression e the result of [list +
446                            e] is a parsing expression as well.  This  is  the
447                            positive  kleene  closure,  describing one or more
448                            repetitions.
449
450                     [5]    For a parsing expression e the result of  [list  &
451                            e]  is  a parsing expression as well.  This is the
452                            and lookahead predicate.
453
454                     [6]    For a parsing expression e the result of  [list  !
455                            e]  is  a parsing expression as well.  This is the
456                            not lookahead predicate.
457
458                     [7]    For a parsing expression e the result of  [list  ?
459                            e]  is  a parsing expression as well.  This is the
460                            optional input.
461
462       Canonical serialization
463              The canonical serialization of a parsing expression has the for‐
464              mat  as  specified  in  the previous item, and then additionally
465              satisfies the constraints below, which make it unique among  all
466              the possible serializations of this parsing expression.
467
468              [1]    The  string  representation of the value is the canonical
469                     representation of a pure Tcl list. I.e. it does not  con‐
470                     tain superfluous whitespace.
471
472              [2]    Terminals  are not encoded as ranges (where start and end
473                     of the range are identical).
474
475   EXAMPLE
476       Assuming the parsing expression shown on the  right-hand  side  of  the
477       rule
478
479                  Expression <- Term (AddOp Term)*
480
481
482       then its canonical serialization (except for whitespace) is
483
484                  {x {n Term} {* {x {n AddOp} {n Term}}}}
485
486

BUGS, IDEAS, FEEDBACK

488       This  document,  and the package it describes, will undoubtedly contain
489       bugs and other problems.  Please report such in the category pt of  the
490       Tcllib  Trackers  [http://core.tcl.tk/tcllib/reportlist].   Please also
491       report any ideas for enhancements  you  may  have  for  either  package
492       and/or documentation.
493
494       When proposing code changes, please provide unified diffs, i.e the out‐
495       put of diff -u.
496
497       Note further that  attachments  are  strongly  preferred  over  inlined
498       patches.  Attachments  can  be  made  by  going to the Edit form of the
499       ticket immediately after its creation, and  then  using  the  left-most
500       button in the secondary navigation bar.
501

KEYWORDS

503       EBNF,  LL(k),  PEG,  TDPL, context-free languages, expression, grammar,
504       matching, parser, parsing expression, parsing expression grammar,  push
505       down  automaton,  recursive descent, state, top-down parsing languages,
506       transducer
507

CATEGORY

509       Parsing and Grammars
510
512       Copyright (c) 2009 Andreas Kupries <andreas_kupries@users.sourceforge.net>
513
514
515
516
517tcllib                               1.0.1                  pt::peg::import(n)
Impressum