1pt::peg::interp(n)               Parser Tools               pt::peg::interp(n)
2
3
4
5______________________________________________________________________________
6

NAME

8       pt::peg::interp - Interpreter for parsing expression grammars
9

SYNOPSIS

11       package require Tcl  8.5
12
13       package require pt::peg::interp  ?1.0.1?
14
15       package require pt::rde  ?1?
16
17       package require snit
18
19       ::pt::peg::interp objectName grammar
20
21       objectName use grammar
22
23       objectName destroy
24
25       objectName parse chan
26
27       objectName parset text
28
29______________________________________________________________________________
30

DESCRIPTION

32       Are  you  lost ?  Do you have trouble understanding this document ?  In
33       that case please read the overview  provided  by  the  Introduction  to
34       Parser  Tools.  This document is the entrypoint to the whole system the
35       current package is a part of.
36
37       This package provides a class whose instances are Packrat parsers  con‐
38       figurable  with  a  parsing expression grammar. The grammar is executed
39       directly, i.e. interpreted, with the underlying runtime provided by the
40       package pt::rde, basing everything on the PARAM.
41
42       Like  the supporting runtime this package resides in the Execution sec‐
43       tion of the Core Layer of Parser Tools.
44
45       IMAGE: arch_core_transform
46
47       The interpreted grammar is copied from  an  instance  of  pt::peg::con‐
48       tainer,  or anything providing the same API, like the container classes
49       created by  pt::peg::to::container  or  the  associated  export  plugin
50       pt::peg::export::container.
51
52   CLASS API
53       The package exports the API described here.
54
55       ::pt::peg::interp objectName grammar
56              The  command  creates  a new parser object and returns the fully
57              qualified name of the object command as its result. The  API  of
58              this  object  command is described in the section Object API. It
59              may be used to invoke various operations on the object.
60
61              This new parser is configured for the execution of an empty PEG.
62              To  configure the object for any other PEG use the method use of
63              the Object API.
64
65   OBJECT API
66       All objects created by this package provide the following methods.
67
68       objectName use grammar
69              This method configures the grammar interpreter / parser for  the
70              execution  of the PEG stored in grammar, an object which is API-
71              compatible to instances of pt::peg::container. The parser copies
72              the  relevant information of the grammar, and does not take own‐
73              ership of the object.
74
75              The information of any previously used grammar is overwritten.
76
77              The result of the method the empty string.
78
79       objectName destroy
80              This method destroys the parser instance, releasing all  claimed
81              memory and other resources, and deleting the instance command.
82
83              The result of the command is the empty string.
84
85       objectName parse chan
86              This  method runs the parser using the contents of chan as input
87              (starting at the current location in the channel), until parsing
88              is  not  possible anymore, either because parsing has completed,
89              or run into a syntax error.
90
91              Note here that the Parser Tools are based on Tcl 8.5+. In  other
92              words, the channel argument is not restricted to files, sockets,
93              etc. We have the full power of reflected channels available.
94
95              It should also be noted that the  parser  pulls  the  characters
96              from  the  input stream as it needs them. If a parser created by
97              this package has to be operated in a push aka event-driven  man‐
98              ner  it  will  be necessary to go to Tcl 8.6+ and use the corou‐
99              tine::auto to wrap it into a coroutine where  read  is  properly
100              changed for push-operation.
101
102              Upon  successful completion the command returns an abstract syn‐
103              tax tree as its result.  This AST is in the  form  specified  in
104              section AST serialization format.  As a plain nested Tcl-list it
105              can then be processed with any Tcl commands the user likes,  do‐
106              ing  transformations, semantic checks, etc.  To help in this the
107              package pt::ast provides a set of convenience commands for vali‐
108              dation of the tree's basic structure, printing it for debugging,
109              and walking it either from the bottom up, or top down.
110
111              When encountering a syntax error the command will throw an error
112              instead.   This  error will be a 4-element Tcl-list, containing,
113              in the order listed below:
114
115              [1]    The string pt::rde identifying it as parser  runtime  er‐
116                     ror.
117
118              [2]    The location of the parse error, as character offset from
119                     the beginning of the parsed input.
120
121              [3]    The location of parse error, now as a 2-element list con‐
122                     taining line-number and column in the line.
123
124              [4]    A  set  of atomic parsing expressions indicating encoding
125                     the characters and/or nonterminal symbols the parser  ex‐
126                     pected to see at the location of the parse error, but did
127                     not get.  For the specification of atomic parsing expres‐
128                     sions please see the section PE serialization format.
129
130       objectName parset text
131              This  method  runs the parser using the string in text as input.
132              In all other ways it behaves like the method parse, shown above.
133

AST SERIALIZATION FORMAT

135       Here we specify the format used by the Parser Tools  to  serialize  Ab‐
136       stract  Syntax Trees (ASTs) as immutable values for transport, compari‐
137       son, etc.
138
139       Each node in an AST represents a nonterminal symbol of a  grammar,  and
140       the  range of tokens/characters in the input covered by it. ASTs do not
141       contain terminal symbols, i.e. tokens/characters. These can  be  recov‐
142       ered from the input given a symbol's location.
143
144       We  distinguish  between regular and canonical serializations.  While a
145       tree may have more than one regular serialization only exactly  one  of
146       them will be canonical.
147
148       Regular serialization
149
150              [1]    The  serialization of any AST is the serialization of its
151                     root node.
152
153              [2]    The serialization of any node is a Tcl list containing at
154                     least three elements.
155
156                     [1]    The  first  element is the name of the nonterminal
157                            symbol stored in the node.
158
159                     [2]    The second and third element are the locations  of
160                            the  first  and last token in the token stream the
161                            node represents (covers).
162
163                            [1]    Locations are provided as non-negative  in‐
164                                   teger offsets from the beginning of the to‐
165                                   ken stream, with the first token  found  in
166                                   the stream located at offset 0 (zero).
167
168                            [2]    The  end  location  has  to  be equal to or
169                                   larger than the start location.
170
171                     [3]    All elements after the first three  represent  the
172                            children  of the node, which are themselves nodes.
173                            This means that the serializations of nodes  with‐
174                            out  children, i.e. leaf nodes, have exactly three
175                            elements.  The children are  stored  in  the  list
176                            with  the  leftmost child first, and the rightmost
177                            child last.
178
179       Canonical serialization
180              The canonical serialization of an abstract syntax tree  has  the
181              format  as specified in the previous item, and then additionally
182              satisfies the constraints below, which make it unique among  all
183              the possible serializations of this tree.
184
185              [1]    The  string  representation of the value is the canonical
186                     representation of a pure Tcl list. I.e. it does not  con‐
187                     tain superfluous whitespace.
188
189   EXAMPLE
190       Assuming the parsing expression grammar below
191
192              PEG calculator (Expression)
193                  Digit      <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9'       ;
194                  Sign       <- '-' / '+'                                     ;
195                  Number     <- Sign? Digit+                                  ;
196                  Expression <- Term (AddOp Term)*                            ;
197                  MulOp      <- '*' / '/'                                     ;
198                  Term       <- Factor (MulOp Factor)*                        ;
199                  AddOp      <- '+'/'-'                                       ;
200                  Factor     <- '(' Expression ')' / Number                   ;
201              END;
202
203
204       and the input string
205
206               120+5
207       then a parser should deliver the abstract syntax tree below (except for
208       whitespace)
209
210              set ast {Expression 0 4
211                  {Factor 0 4
212                      {Term 0 2
213                          {Number 0 2
214                              {Digit 0 0}
215                              {Digit 1 1}
216                              {Digit 2 2}
217                          }
218                      }
219                      {AddOp 3 3}
220                      {Term 4 4
221                          {Number 4 4
222                              {Digit 4 4}
223                          }
224                      }
225                  }
226              }
227
228
229       Or, more graphical
230
231       .nf +- Digit 0 0 | 1 |            | +- Term 0 2  ---  Number  0  2  -+-
232       Digit   1   1   |   2   |                            |             |  |
233       +- Digit 2 2 | 0 |                                        |  Expression
234       0  4  ---  Factor  0  4 -+----------------------------- AddOp 3 3 | + |
235       | +- Term 4 4 --- Number 4 4 --- Digit 4 4 | 5 .fi
236

PE SERIALIZATION FORMAT

238       Here we specify the format used by the Parser Tools to serialize  Pars‐
239       ing Expressions as immutable values for transport, comparison, etc.
240
241       We  distinguish  between regular and canonical serializations.  While a
242       parsing expression may have more than one  regular  serialization  only
243       exactly one of them will be canonical.
244
245       Regular serialization
246
247              Atomic Parsing Expressions
248
249                     [1]    The  string  epsilon  is an atomic parsing expres‐
250                            sion. It matches the empty string.
251
252                     [2]    The string dot is an atomic parsing expression. It
253                            matches any character.
254
255                     [3]    The  string alnum is an atomic parsing expression.
256                            It matches any Unicode alphabet or  digit  charac‐
257                            ter.  This  is  a custom extension of PEs based on
258                            Tcl's builtin command string is.
259
260                     [4]    The string alpha is an atomic parsing  expression.
261                            It matches any Unicode alphabet character. This is
262                            a custom extension of PEs based on  Tcl's  builtin
263                            command string is.
264
265                     [5]    The  string ascii is an atomic parsing expression.
266                            It matches any Unicode character below U0080. This
267                            is  a  custom  extension  of  PEs  based  on Tcl's
268                            builtin command string is.
269
270                     [6]    The string control is an  atomic  parsing  expres‐
271                            sion.  It  matches  any Unicode control character.
272                            This is a custom extension of PEs based  on  Tcl's
273                            builtin command string is.
274
275                     [7]    The  string digit is an atomic parsing expression.
276                            It matches any Unicode digit character. Note  that
277                            this  includes  characters  outside  of the [0..9]
278                            range. This is a custom extension of PEs based  on
279                            Tcl's builtin command string is.
280
281                     [8]    The  string graph is an atomic parsing expression.
282                            It matches any Unicode printing character,  except
283                            for space. This is a custom extension of PEs based
284                            on Tcl's builtin command string is.
285
286                     [9]    The string lower is an atomic parsing  expression.
287                            It matches any Unicode lower-case alphabet charac‐
288                            ter. This is a custom extension of  PEs  based  on
289                            Tcl's builtin command string is.
290
291                     [10]   The  string print is an atomic parsing expression.
292                            It matches any Unicode printing character, includ‐
293                            ing space. This is a custom extension of PEs based
294                            on Tcl's builtin command string is.
295
296                     [11]   The string punct is an atomic parsing  expression.
297                            It matches any Unicode punctuation character. This
298                            is a  custom  extension  of  PEs  based  on  Tcl's
299                            builtin command string is.
300
301                     [12]   The  string space is an atomic parsing expression.
302                            It matches any Unicode space character. This is  a
303                            custom  extension  of  PEs  based on Tcl's builtin
304                            command string is.
305
306                     [13]   The string upper is an atomic parsing  expression.
307                            It matches any Unicode upper-case alphabet charac‐
308                            ter. This is a custom extension of  PEs  based  on
309                            Tcl's builtin command string is.
310
311                     [14]   The  string  wordchar is an atomic parsing expres‐
312                            sion. It matches any Unicode word character.  This
313                            is any alphanumeric character (see alnum), and any
314                            connector  punctuation  characters  (e.g.   under‐
315                            score). This is a custom extension of PEs based on
316                            Tcl's builtin command string is.
317
318                     [15]   The string xdigit is an atomic parsing expression.
319                            It  matches  any hexadecimal digit character. This
320                            is a  custom  extension  of  PEs  based  on  Tcl's
321                            builtin command string is.
322
323                     [16]   The string ddigit is an atomic parsing expression.
324                            It matches any decimal digit character. This is  a
325                            custom  extension  of  PEs  based on Tcl's builtin
326                            command regexp.
327
328                     [17]   The expression [list t x] is an atomic parsing ex‐
329                            pression. It matches the terminal string x.
330
331                     [18]   The expression [list n A] is an atomic parsing ex‐
332                            pression. It matches the nonterminal A.
333
334              Combined Parsing Expressions
335
336                     [1]    For parsing expressions e1, e2, ... the result  of
337                            [list  /  e1  e2  ... ] is a parsing expression as
338                            well.  This is the ordered choice, aka prioritized
339                            choice.
340
341                     [2]    For  parsing expressions e1, e2, ... the result of
342                            [list x e1 e2 ... ] is  a  parsing  expression  as
343                            well.  This is the sequence.
344
345                     [3]    For  a  parsing expression e the result of [list *
346                            e] is a parsing expression as well.  This  is  the
347                            kleene  closure,  describing  zero or more repeti‐
348                            tions.
349
350                     [4]    For a parsing expression e the result of  [list  +
351                            e]  is  a parsing expression as well.  This is the
352                            positive kleene closure, describing  one  or  more
353                            repetitions.
354
355                     [5]    For  a  parsing expression e the result of [list &
356                            e] is a parsing expression as well.  This  is  the
357                            and lookahead predicate.
358
359                     [6]    For  a  parsing expression e the result of [list !
360                            e] is a parsing expression as well.  This  is  the
361                            not lookahead predicate.
362
363                     [7]    For  a  parsing expression e the result of [list ?
364                            e] is a parsing expression as well.  This  is  the
365                            optional input.
366
367       Canonical serialization
368              The canonical serialization of a parsing expression has the for‐
369              mat as specified in the previous  item,  and  then  additionally
370              satisfies  the constraints below, which make it unique among all
371              the possible serializations of this parsing expression.
372
373              [1]    The string representation of the value is  the  canonical
374                     representation  of a pure Tcl list. I.e. it does not con‐
375                     tain superfluous whitespace.
376
377              [2]    Terminals are not encoded as ranges (where start and  end
378                     of the range are identical).
379
380   EXAMPLE
381       Assuming  the  parsing  expression  shown on the right-hand side of the
382       rule
383
384                  Expression <- Term (AddOp Term)*
385
386
387       then its canonical serialization (except for whitespace) is
388
389                  {x {n Term} {* {x {n AddOp} {n Term}}}}
390
391

BUGS, IDEAS, FEEDBACK

393       This document, and the package it describes, will  undoubtedly  contain
394       bugs  and other problems.  Please report such in the category pt of the
395       Tcllib Trackers  [http://core.tcl.tk/tcllib/reportlist].   Please  also
396       report  any  ideas  for  enhancements  you  may have for either package
397       and/or documentation.
398
399       When proposing code changes, please provide unified diffs, i.e the out‐
400       put of diff -u.
401
402       Note  further  that  attachments  are  strongly  preferred over inlined
403       patches. Attachments can be made by going  to  the  Edit  form  of  the
404       ticket  immediately  after  its  creation, and then using the left-most
405       button in the secondary navigation bar.
406

KEYWORDS

408       EBNF, LL(k), PEG, TDPL, context-free  languages,  expression,  grammar,
409       matching,  parser, parsing expression, parsing expression grammar, push
410       down automaton, recursive descent, state, top-down  parsing  languages,
411       transducer
412

CATEGORY

414       Parsing and Grammars
415
417       Copyright (c) 2009 Andreas Kupries <andreas_kupries@users.sourceforge.net>
418
419
420
421
422tcllib                               1.0.1                  pt::peg::interp(n)
Impressum