1pt::peg::import(n)               Parser Tools               pt::peg::import(n)
2
3
4
5______________________________________________________________________________
6

NAME

8       pt::peg::import - PEG Import
9

SYNOPSIS

11       package require Tcl  8.5
12
13       package require snit
14
15       package require configuration
16
17       package require pt::peg
18
19       package require pluginmgr
20
21       package require pt::peg::import  ?1?
22
23       ::pt::peg::import objectName
24
25       objectName method ?arg arg ...?
26
27       objectName destroy
28
29       objectName import text text ?format?
30
31       objectName import file path ?format?
32
33       objectName import object text object text ?format?
34
35       objectName import object file object path ?format?
36
37       objectName includes
38
39       objectName include add path
40
41       objectName include remove path
42
43       objectName include clear
44
45______________________________________________________________________________
46

DESCRIPTION

48       Are  you  lost ?  Do you have trouble understanding this document ?  In
49       that case please read the overview  provided  by  the  Introduction  to
50       Parser  Tools.  This document is the entrypoint to the whole system the
51       current package is a part of.
52
53       This package provides a manager for parsing expression  grammars,  with
54       each  instance  handling  a  set of plugins for the import of them from
55       other formats, i.e. their conversion from, for example peg,  container,
56       json, etc.
57
58       It resides in the Import section of the Core Layer of Parser Tools, and
59       is one of the three pillars the management of parsing expression  gram‐
60       mars resides on.
61
62       IMAGE: arch_core_import
63
64       The other two pillars are, as shown above
65
66       [1]    PEG Export, and
67
68       [2]    PEG Storage
69
70       For  information  about the data structure which is the major output of
71       the manager objects provided by this package see the section PEG  seri‐
72       alization format.
73
74       The  plugin  system of our class is based on the package pluginmgr, and
75       configured to look for plugins using
76
77       [1]    the environment variable GRAMMAR_PEG_IMPORT_PLUGINS,
78
79       [2]    the environment variable GRAMMAR_PEG_PLUGINS,
80
81       [3]    the environment variable GRAMMAR_PLUGINS,
82
83       [4]    the path "~/.grammar/peg/import/plugin"
84
85       [5]    the path "~/.grammar/peg/plugin"
86
87       [6]    the path "~/.grammar/plugin"
88
89       [7]    the path "~/.grammar/peg/import/plugins"
90
91       [8]    the path "~/.grammar/peg/plugins"
92
93       [9]    the path "~/.grammar/plugins"
94
95       [10]   the     registry     entry     "HKEY_CURRENT_USER\SOFTWARE\GRAM‐
96              MAR\PEG\IMPORT\PLUGINS"
97
98       [11]   the registry entry "HKEY_CURRENT_USER\SOFTWARE\GRAMMAR\PEG\PLUG‐
99              INS"
100
101       [12]   the registry entry "HKEY_CURRENT_USER\SOFTWARE\GRAMMAR\PLUGINS"
102
103       The last three are used only when the package is run on a machine using
104       the Windows(tm) operating system.
105
106       The  whole  system  is  delivered with three predefined import plugins,
107       namely
108
109       container
110              See PEG Import Plugin. From CONTAINER format for details.
111
112       json   See PEG Import Plugin. From JSON format for details.
113
114       peg    See PEG Import Plugin. From PEG format for details.
115
116       For readers wishing to write their own import plugin for  some  format,
117       i.e. plugin writers, reading and understanding the Parser Tools Impport
118       API specification is an absolute necessity, as it documents the  inter‐
119       action between this package and its plugins in detail.
120

API

122   PACKAGE COMMANDS
123       ::pt::peg::import objectName
124              This command creates a new import manager object with an associ‐
125              ated Tcl command whose name is objectName. This  object  command
126              is  explained  in full detail in the sections Object command and
127              Object methods. The object command will  be  created  under  the
128              current  namespace if the objectName is not fully qualified, and
129              in the specified namespace otherwise.
130
131   OBJECT COMMAND
132       All objects created by the ::pt::peg::import command have the following
133       general form:
134
135       objectName method ?arg arg ...?
136              The  method method and its arg'uments determine the exact behav‐
137              ior of the command.  See section Object methods for the detailed
138              specifications.
139
140   OBJECT METHODS
141       objectName destroy
142              This method destroys the object it is invoked for.
143
144       objectName import text text ?format?
145              This  method  takes  the text and converts it from the specified
146              format to the canonical serialization of  a  parsing  expression
147              grammar  using  the  import  plugin  for the format. An error is
148              thrown if no plugin could be found for the format.  The  serial‐
149              ization  generated  by the conversion process is returned as the
150              result of this method.
151
152              If no format is specified the method defaults to text.
153
154              The specification of what a canonical serialization  is  can  be
155              found in the section PEG serialization format.
156
157              The  plugin  has  to  conform to the interface documented in the
158              Parser Tools Import API specification.
159
160       objectName import file path ?format?
161              This method is a  convenient  wrapper  around  the  import  text
162              method described by the previous item.  It reads the contents of
163              the specified file into memory, feeds  the  result  into  import
164              text and returns the resulting serialization as its own result.
165
166       objectName import object text object text ?format?
167              This  method  is  a  convenient  wrapper  around the import text
168              method described by the previous item.  It expects  that  object
169              is  an  object command supporting a deserialize method expecting
170              the canonical serialization of a parsing expression grammar.  It
171              imports  the text using import text and then feeds the resulting
172              serialization into the  object  via  deserialize.   This  method
173              returns the empty string as it result.
174
175       objectName import object file object path ?format?
176              This  method  behaves  like  import  object text, except that it
177              reads the text to convert from the  specified  file  instead  of
178              being given it as argument.
179
180       objectName includes
181              This  method  returns  a list containing the currently specified
182              paths to use to search for include files when processing  input.
183              The order of paths in the list corresponds to the order in which
184              they are used, from first to last, and also corresponds  to  the
185              order in which they were added to the object.
186
187       objectName include add path
188              This methods adds the specified path to the list of paths to use
189              to search for include files when processing input. The  path  is
190              added  to  the  end of the list, causing it to be searched after
191              all previously added paths. The result of  the  command  is  the
192              empty string.
193
194              The method does nothing if the path is already known.
195
196       objectName include remove path
197              This  methods  removes the specified path from the list of paths
198              to use to search for include files when  processing  input.  The
199              result of the command is the empty string.
200
201              The method does nothing if the path is not known.
202
203       objectName include clear
204              This  method  clears  the  list  of  paths  to use to search for
205              include files when processing input. The result of  the  command
206              is the empty string.
207

PEG SERIALIZATION FORMAT

209       Here  we specify the format used by the Parser Tools to serialize Pars‐
210       ing Expression Grammars as immutable values for transport,  comparison,
211       etc.
212
213       We  distinguish  between regular and canonical serializations.  While a
214       PEG may have more than one regular serialization only  exactly  one  of
215       them will be canonical.
216
217       regular serialization
218
219              [1]    The serialization of any PEG is a nested Tcl dictionary.
220
221              [2]    This dictionary holds a single key, pt::grammar::peg, and
222                     its value. This value holds the contents of the grammar.
223
224              [3]    The contents of the grammar are a Tcl dictionary  holding
225                     the  set  of nonterminal symbols and the starting expres‐
226                     sion. The relevant keys and their values are
227
228                     rules  The value is a Tcl dictionary whose keys  are  the
229                            names  of  the  nonterminal  symbols  known to the
230                            grammar.
231
232                            [1]    Each  nonterminal  symbol  may  occur  only
233                                   once.
234
235                            [2]    The empty string is not a legal nonterminal
236                                   symbol.
237
238                            [3]    The value for each symbol is a Tcl  dictio‐
239                                   nary  itself.  The  relevant keys and their
240                                   values in this dictionary are
241
242                                   is     The value is  the  serialization  of
243                                          the  parsing  expression  describing
244                                          the symbols sentennial structure, as
245                                          specified  in the section PE serial‐
246                                          ization format.
247
248                                   mode   The value can be one of three values
249                                          specifying  how a parser should han‐
250                                          dle the semantic value  produced  by
251                                          the symbol.
252
253                                          value  The  semantic  value  of  the
254                                                 nonterminal  symbol   is   an
255                                                 abstract syntax tree consist‐
256                                                 ing of a single node node for
257                                                 the nonterminal itself, which
258                                                 has the ASTs of the  symbol's
259                                                 right  hand side as its chil‐
260                                                 dren.
261
262                                          leaf   The  semantic  value  of  the
263                                                 nonterminal   symbol   is  an
264                                                 abstract syntax tree consist‐
265                                                 ing of a single node node for
266                                                 the nonterminal, without  any
267                                                 children.  Any ASTs generated
268                                                 by the  symbol's  right  hand
269                                                 side are discarded.
270
271                                          void   The nonterminal has no seman‐
272                                                 tic value. Any ASTs generated
273                                                 by  the  symbol's  right hand
274                                                 side are discarded (as well).
275
276                     start  The value is the serialization of the start  pars‐
277                            ing expression of the grammar, as specified in the
278                            section PE serialization format.
279
280              [4]    The terminal symbols of the grammar are specified implic‐
281                     itly as the set of all terminal symbols used in the start
282                     expression and on the RHS of the grammar rules.
283
284       canonical serialization
285              The canonical serialization of a grammar has the format as spec‐
286              ified  in the previous item, and then additionally satisfies the
287              constraints below, which make it unique among all  the  possible
288              serializations of this grammar.
289
290              [1]    The  keys  found  in  all the nested Tcl dictionaries are
291                     sorted in ascending dictionary  order,  as  generated  by
292                     Tcl's builtin command lsort -increasing -dict.
293
294              [2]    The  string  representation of the value is the canonical
295                     representation of a Tcl dictionary. I.e. it does not con‐
296                     tain superfluous whitespace.
297
298   EXAMPLE
299       Assuming the following PEG for simple mathematical expressions
300
301              PEG calculator (Expression)
302                  Digit      <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9'       ;
303                  Sign       <- '-' / '+'                                     ;
304                  Number     <- Sign? Digit+                                  ;
305                  Expression <- Term (AddOp Term)*                            ;
306                  MulOp      <- '*' / '/'                                     ;
307                  Term       <- Factor (MulOp Factor)*                        ;
308                  AddOp      <- '+'/'-'                                       ;
309                  Factor     <- '(' Expression ')' / Number                   ;
310              END;
311
312
313       then its canonical serialization (except for whitespace) is
314
315              pt::grammar::peg {
316                  rules {
317                      AddOp      {is {/ {t -} {t +}}                                                                mode value}
318                      Digit      {is {/ {t 0} {t 1} {t 2} {t 3} {t 4} {t 5} {t 6} {t 7} {t 8} {t 9}}                mode value}
319                      Expression {is {x {n Term} {* {x {n AddOp} {n Term}}}}                                        mode value}
320                      Factor     {is {/ {x {t (} {n Expression} {t )}} {n Number}}                                  mode value}
321                      MulOp      {is {/ {t *} {t /}}                                                                mode value}
322                      Number     {is {x {? {n Sign}} {+ {n Digit}}}                                                 mode value}
323                      Sign       {is {/ {t -} {t +}}                                                                mode value}
324                      Term       {is {x {n Factor} {* {x {n MulOp} {n Factor}}}}                                    mode value}
325                  }
326                  start {n Expression}
327              }
328
329

PE SERIALIZATION FORMAT

331       Here  we specify the format used by the Parser Tools to serialize Pars‐
332       ing Expressions as immutable values for transport, comparison, etc.
333
334       We distinguish between regular and canonical serializations.   While  a
335       parsing  expression  may  have more than one regular serialization only
336       exactly one of them will be canonical.
337
338       Regular serialization
339
340              Atomic Parsing Expressions
341
342                     [1]    The string epsilon is an  atomic  parsing  expres‐
343                            sion. It matches the empty string.
344
345                     [2]    The string dot is an atomic parsing expression. It
346                            matches any character.
347
348                     [3]    The string alnum is an atomic parsing  expression.
349                            It  matches  any Unicode alphabet or digit charac‐
350                            ter. This is a custom extension of  PEs  based  on
351                            Tcl's builtin command string is.
352
353                     [4]    The  string alpha is an atomic parsing expression.
354                            It matches any Unicode alphabet character. This is
355                            a  custom  extension of PEs based on Tcl's builtin
356                            command string is.
357
358                     [5]    The string ascii is an atomic parsing  expression.
359                            It matches any Unicode character below U0080. This
360                            is a  custom  extension  of  PEs  based  on  Tcl's
361                            builtin command string is.
362
363                     [6]    The  string  control  is an atomic parsing expres‐
364                            sion. It matches any  Unicode  control  character.
365                            This  is  a custom extension of PEs based on Tcl's
366                            builtin command string is.
367
368                     [7]    The string digit is an atomic parsing  expression.
369                            It  matches any Unicode digit character. Note that
370                            this includes characters  outside  of  the  [0..9]
371                            range.  This is a custom extension of PEs based on
372                            Tcl's builtin command string is.
373
374                     [8]    The string graph is an atomic parsing  expression.
375                            It  matches any Unicode printing character, except
376                            for space. This is a custom extension of PEs based
377                            on Tcl's builtin command string is.
378
379                     [9]    The  string lower is an atomic parsing expression.
380                            It matches any Unicode lower-case alphabet charac‐
381                            ter.  This  is  a custom extension of PEs based on
382                            Tcl's builtin command string is.
383
384                     [10]   The string print is an atomic parsing  expression.
385                            It matches any Unicode printing character, includ‐
386                            ing space. This is a custom extension of PEs based
387                            on Tcl's builtin command string is.
388
389                     [11]   The  string punct is an atomic parsing expression.
390                            It matches any Unicode punctuation character. This
391                            is  a  custom  extension  of  PEs  based  on Tcl's
392                            builtin command string is.
393
394                     [12]   The string space is an atomic parsing  expression.
395                            It  matches any Unicode space character. This is a
396                            custom extension of PEs  based  on  Tcl's  builtin
397                            command string is.
398
399                     [13]   The  string upper is an atomic parsing expression.
400                            It matches any Unicode upper-case alphabet charac‐
401                            ter.  This  is  a custom extension of PEs based on
402                            Tcl's builtin command string is.
403
404                     [14]   The string wordchar is an atomic  parsing  expres‐
405                            sion.  It matches any Unicode word character. This
406                            is any alphanumeric character (see alnum), and any
407                            connector  punctuation  characters  (e.g.   under‐
408                            score). This is a custom extension of PEs based on
409                            Tcl's builtin command string is.
410
411                     [15]   The string xdigit is an atomic parsing expression.
412                            It matches any hexadecimal digit  character.  This
413                            is  a  custom  extension  of  PEs  based  on Tcl's
414                            builtin command string is.
415
416                     [16]   The string ddigit is an atomic parsing expression.
417                            It  matches any decimal digit character. This is a
418                            custom extension of PEs  based  on  Tcl's  builtin
419                            command regexp.
420
421                     [17]   The  expression  [list  t  x] is an atomic parsing
422                            expression. It matches the terminal string x.
423
424                     [18]   The expression [list n A]  is  an  atomic  parsing
425                            expression. It matches the nonterminal A.
426
427              Combined Parsing Expressions
428
429                     [1]    For  parsing expressions e1, e2, ... the result of
430                            [list / e1 e2 ... ] is  a  parsing  expression  as
431                            well.  This is the ordered choice, aka prioritized
432                            choice.
433
434                     [2]    For parsing expressions e1, e2, ... the result  of
435                            [list  x  e1  e2  ... ] is a parsing expression as
436                            well.  This is the sequence.
437
438                     [3]    For a parsing expression e the result of  [list  *
439                            e]  is  a parsing expression as well.  This is the
440                            kleene closure, describing zero  or  more  repeti‐
441                            tions.
442
443                     [4]    For  a  parsing expression e the result of [list +
444                            e] is a parsing expression as well.  This  is  the
445                            positive  kleene  closure,  describing one or more
446                            repetitions.
447
448                     [5]    For a parsing expression e the result of  [list  &
449                            e]  is  a parsing expression as well.  This is the
450                            and lookahead predicate.
451
452                     [6]    For a parsing expression e the result of  [list  !
453                            e]  is  a parsing expression as well.  This is the
454                            not lookahead predicate.
455
456                     [7]    For a parsing expression e the result of  [list  ?
457                            e]  is  a parsing expression as well.  This is the
458                            optional input.
459
460       Canonical serialization
461              The canonical serialization of a parsing expression has the for‐
462              mat  as  specified  in  the previous item, and then additionally
463              satisfies the constraints below, which make it unique among  all
464              the possible serializations of this parsing expression.
465
466              [1]    The  string  representation of the value is the canonical
467                     representation of a pure Tcl list. I.e. it does not  con‐
468                     tain superfluous whitespace.
469
470              [2]    Terminals  are not encoded as ranges (where start and end
471                     of the range are identical).
472
473   EXAMPLE
474       Assuming the parsing expression shown on the  right-hand  side  of  the
475       rule
476
477                  Expression <- Term (AddOp Term)*
478
479
480       then its canonical serialization (except for whitespace) is
481
482                  {x {n Term} {* {x {n AddOp} {n Term}}}}
483
484

BUGS, IDEAS, FEEDBACK

486       This  document,  and the package it describes, will undoubtedly contain
487       bugs and other problems.  Please report such in the category pt of  the
488       Tcllib  Trackers  [http://core.tcl.tk/tcllib/reportlist].   Please also
489       report any ideas for enhancements  you  may  have  for  either  package
490       and/or documentation.
491
492       When proposing code changes, please provide unified diffs, i.e the out‐
493       put of diff -u.
494
495       Note further that  attachments  are  strongly  preferred  over  inlined
496       patches.  Attachments  can  be  made  by  going to the Edit form of the
497       ticket immediately after its creation, and  then  using  the  left-most
498       button in the secondary navigation bar.
499

KEYWORDS

501       EBNF,  LL(k),  PEG,  TDPL, context-free languages, expression, grammar,
502       matching, parser, parsing expression, parsing expression grammar,  push
503       down  automaton,  recursive descent, state, top-down parsing languages,
504       transducer
505

CATEGORY

507       Parsing and Grammars
508
510       Copyright (c) 2009 Andreas Kupries <andreas_kupries@users.sourceforge.net>
511
512
513
514
515tcllib                                 1                    pt::peg::import(n)
Impressum