1pt::peg::interp(n)               Parser Tools               pt::peg::interp(n)
2
3
4
5______________________________________________________________________________
6

NAME

8       pt::peg::interp - Interpreter for parsing expression grammars
9

SYNOPSIS

11       package require Tcl  8.5
12
13       package require pt::peg::interp  ?1.0.1?
14
15       package require pt::rde  ?1?
16
17       package require snit
18
19       ::pt::peg::interp objectName grammar
20
21       objectName use grammar
22
23       objectName destroy
24
25       objectName parse chan
26
27       objectName parset text
28
29______________________________________________________________________________
30

DESCRIPTION

32       Are  you  lost ?  Do you have trouble understanding this document ?  In
33       that case please read the overview  provided  by  the  Introduction  to
34       Parser  Tools.  This document is the entrypoint to the whole system the
35       current package is a part of.
36
37       This package provides a class whose instances are Packrat parsers  con‐
38       figurable  with  a  parsing expression grammar. The grammar is executed
39       directly, i.e. interpreted, with the underlying runtime provided by the
40       package pt::rde, basing everything on the PARAM.
41
42       Like  the supporting runtime this package resides in the Execution sec‐
43       tion of the Core Layer of Parser Tools.
44
45       IMAGE: arch_core_transform
46
47       The interpreted grammar is copied from  an  instance  of  pt::peg::con‐
48       tainer,  or anything providing the same API, like the container classes
49       created by  pt::peg::to::container  or  the  associated  export  plugin
50       pt::peg::export::container.
51
52   CLASS API
53       The package exports the API described here.
54
55       ::pt::peg::interp objectName grammar
56              The  command  creates  a new parser object and returns the fully
57              qualified name of the object command as its result. The  API  of
58              this  object  command is described in the section Object API. It
59              may be used to invoke various operations on the object.
60
61              This new parser is configured for the execution of an empty PEG.
62              To  configure the object for any other PEG use the method use of
63              the Object API.
64
65   OBJECT API
66       All objects created by this package provide the following methods.
67
68       objectName use grammar
69              This method configures the grammar interpreter / parser for  the
70              execution  of the PEG stored in grammar, an object which is API-
71              compatible to instances of pt::peg::container. The parser copies
72              the  relevant information of the grammar, and does not take own‐
73              ership of the object.
74
75              The information of any previously used grammar is overwritten.
76
77              The result of the method the empty string.
78
79       objectName destroy
80              This method destroys the parser instance, releasing all  claimed
81              memory and other resources, and deleting the instance command.
82
83              The result of the command is the empty string.
84
85       objectName parse chan
86              This  method runs the parser using the contents of chan as input
87              (starting at the current location in the channel), until parsing
88              is  not  possible anymore, either because parsing has completed,
89              or run into a syntax error.
90
91              Note here that the Parser Tools are based on Tcl 8.5+. In  other
92              words, the channel argument is not restricted to files, sockets,
93              etc. We have the full power of reflected channels available.
94
95              It should also be noted that the  parser  pulls  the  characters
96              from  the  input stream as it needs them. If a parser created by
97              this package has to be operated in a push aka event-driven  man‐
98              ner  it  will  be necessary to go to Tcl 8.6+ and use the corou‐
99              tine::auto to wrap it into a coroutine where  read  is  properly
100              changed for push-operation.
101
102              Upon  successful completion the command returns an abstract syn‐
103              tax tree as its result.  This AST is in the  form  specified  in
104              section AST serialization format.  As a plain nested Tcl-list it
105              can then be processed with any  Tcl  commands  the  user  likes,
106              doing  transformations,  semantic  checks, etc.  To help in this
107              the package pt::ast provides a set of convenience  commands  for
108              validation of the tree's basic structure, printing it for debug‐
109              ging, and walking it either from the bottom up, or top down.
110
111              When encountering a syntax error the command will throw an error
112              instead.   This  error will be a 4-element Tcl-list, containing,
113              in the order listed below:
114
115              [1]    The string  pt::rde  identifying  it  as  parser  runtime
116                     error.
117
118              [2]    The location of the parse error, as character offset from
119                     the beginning of the parsed input.
120
121              [3]    The location of parse error, now as a 2-element list con‐
122                     taining line-number and column in the line.
123
124              [4]    A  set  of atomic parsing expressions indicating encoding
125                     the characters  and/or  nonterminal  symbols  the  parser
126                     expected  to  see at the location of the parse error, but
127                     did not get.  For the  specification  of  atomic  parsing
128                     expressions  please see the section PE serialization for‐
129                     mat.
130
131       objectName parset text
132              This method runs the parser using the string in text  as  input.
133              In all other ways it behaves like the method parse, shown above.
134

AST SERIALIZATION FORMAT

136       Here  we  specify  the  format  used  by  the Parser Tools to serialize
137       Abstract Syntax Trees (ASTs) as immutable values for transport, compar‐
138       ison, etc.
139
140       Each  node  in an AST represents a nonterminal symbol of a grammar, and
141       the range of tokens/characters in the input covered by it. ASTs do  not
142       contain  terminal  symbols, i.e. tokens/characters. These can be recov‐
143       ered from the input given a symbol's location.
144
145       We distinguish between regular and canonical serializations.   While  a
146       tree  may  have more than one regular serialization only exactly one of
147       them will be canonical.
148
149       Regular serialization
150
151              [1]    The serialization of any AST is the serialization of  its
152                     root node.
153
154              [2]    The serialization of any node is a Tcl list containing at
155                     least three elements.
156
157                     [1]    The first element is the name of  the  nonterminal
158                            symbol stored in the node.
159
160                     [2]    The  second and third element are the locations of
161                            the first and last token in the token  stream  the
162                            node represents (covers).
163
164                            [1]    Locations   are  provided  as  non-negative
165                                   integer offsets from the beginning  of  the
166                                   token stream, with the first token found in
167                                   the stream located at offset 0 (zero).
168
169                            [2]    The end location has  to  be  equal  to  or
170                                   larger than the start location.
171
172                     [3]    All  elements  after the first three represent the
173                            children of the node, which are themselves  nodes.
174                            This  means that the serializations of nodes with‐
175                            out children, i.e. leaf nodes, have exactly  three
176                            elements.   The  children  are  stored in the list
177                            with the leftmost child first, and  the  rightmost
178                            child last.
179
180       Canonical serialization
181              The  canonical  serialization of an abstract syntax tree has the
182              format as specified in the previous item, and then  additionally
183              satisfies  the constraints below, which make it unique among all
184              the possible serializations of this tree.
185
186              [1]    The string representation of the value is  the  canonical
187                     representation  of a pure Tcl list. I.e. it does not con‐
188                     tain superfluous whitespace.
189
190   EXAMPLE
191       Assuming the parsing expression grammar below
192
193              PEG calculator (Expression)
194                  Digit      <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9'       ;
195                  Sign       <- '-' / '+'                                     ;
196                  Number     <- Sign? Digit+                                  ;
197                  Expression <- Term (AddOp Term)*                            ;
198                  MulOp      <- '*' / '/'                                     ;
199                  Term       <- Factor (MulOp Factor)*                        ;
200                  AddOp      <- '+'/'-'                                       ;
201                  Factor     <- '(' Expression ')' / Number                   ;
202              END;
203
204
205       and the input string
206
207               120+5
208       then a parser should deliver the abstract syntax tree below (except for
209       whitespace)
210
211              set ast {Expression 0 4
212                  {Factor 0 4
213                      {Term 0 2
214                          {Number 0 2
215                              {Digit 0 0}
216                              {Digit 1 1}
217                              {Digit 2 2}
218                          }
219                      }
220                      {AddOp 3 3}
221                      {Term 4 4
222                          {Number 4 4
223                              {Digit 4 4}
224                          }
225                      }
226                  }
227              }
228
229
230       Or, more graphical
231
232       .nf  +-  Digit  0  0  | 1 |            | +- Term 0 2 --- Number 0 2 -+-
233       Digit  1  1  |   2   |                             |              |   |
234       +-  Digit 2 2 | 0 |                                        | Expression
235       0 4 --- Factor 0 4 -+----------------------------- AddOp  3  3  |  +  |
236       | +- Term 4 4 --- Number 4 4 --- Digit 4 4 | 5 .fi
237

PE SERIALIZATION FORMAT

239       Here  we specify the format used by the Parser Tools to serialize Pars‐
240       ing Expressions as immutable values for transport, comparison, etc.
241
242       We distinguish between regular and canonical serializations.   While  a
243       parsing  expression  may  have more than one regular serialization only
244       exactly one of them will be canonical.
245
246       Regular serialization
247
248              Atomic Parsing Expressions
249
250                     [1]    The string epsilon is an  atomic  parsing  expres‐
251                            sion. It matches the empty string.
252
253                     [2]    The string dot is an atomic parsing expression. It
254                            matches any character.
255
256                     [3]    The string alnum is an atomic parsing  expression.
257                            It  matches  any Unicode alphabet or digit charac‐
258                            ter. This is a custom extension of  PEs  based  on
259                            Tcl's builtin command string is.
260
261                     [4]    The  string alpha is an atomic parsing expression.
262                            It matches any Unicode alphabet character. This is
263                            a  custom  extension of PEs based on Tcl's builtin
264                            command string is.
265
266                     [5]    The string ascii is an atomic parsing  expression.
267                            It matches any Unicode character below U0080. This
268                            is a  custom  extension  of  PEs  based  on  Tcl's
269                            builtin command string is.
270
271                     [6]    The  string  control  is an atomic parsing expres‐
272                            sion. It matches any  Unicode  control  character.
273                            This  is  a custom extension of PEs based on Tcl's
274                            builtin command string is.
275
276                     [7]    The string digit is an atomic parsing  expression.
277                            It  matches any Unicode digit character. Note that
278                            this includes characters  outside  of  the  [0..9]
279                            range.  This is a custom extension of PEs based on
280                            Tcl's builtin command string is.
281
282                     [8]    The string graph is an atomic parsing  expression.
283                            It  matches any Unicode printing character, except
284                            for space. This is a custom extension of PEs based
285                            on Tcl's builtin command string is.
286
287                     [9]    The  string lower is an atomic parsing expression.
288                            It matches any Unicode lower-case alphabet charac‐
289                            ter.  This  is  a custom extension of PEs based on
290                            Tcl's builtin command string is.
291
292                     [10]   The string print is an atomic parsing  expression.
293                            It matches any Unicode printing character, includ‐
294                            ing space. This is a custom extension of PEs based
295                            on Tcl's builtin command string is.
296
297                     [11]   The  string punct is an atomic parsing expression.
298                            It matches any Unicode punctuation character. This
299                            is  a  custom  extension  of  PEs  based  on Tcl's
300                            builtin command string is.
301
302                     [12]   The string space is an atomic parsing  expression.
303                            It  matches any Unicode space character. This is a
304                            custom extension of PEs  based  on  Tcl's  builtin
305                            command string is.
306
307                     [13]   The  string upper is an atomic parsing expression.
308                            It matches any Unicode upper-case alphabet charac‐
309                            ter.  This  is  a custom extension of PEs based on
310                            Tcl's builtin command string is.
311
312                     [14]   The string wordchar is an atomic  parsing  expres‐
313                            sion.  It matches any Unicode word character. This
314                            is any alphanumeric character (see alnum), and any
315                            connector  punctuation  characters  (e.g.   under‐
316                            score). This is a custom extension of PEs based on
317                            Tcl's builtin command string is.
318
319                     [15]   The string xdigit is an atomic parsing expression.
320                            It matches any hexadecimal digit  character.  This
321                            is  a  custom  extension  of  PEs  based  on Tcl's
322                            builtin command string is.
323
324                     [16]   The string ddigit is an atomic parsing expression.
325                            It  matches any decimal digit character. This is a
326                            custom extension of PEs  based  on  Tcl's  builtin
327                            command regexp.
328
329                     [17]   The  expression  [list  t  x] is an atomic parsing
330                            expression. It matches the terminal string x.
331
332                     [18]   The expression [list n A]  is  an  atomic  parsing
333                            expression. It matches the nonterminal A.
334
335              Combined Parsing Expressions
336
337                     [1]    For  parsing expressions e1, e2, ... the result of
338                            [list / e1 e2 ... ] is  a  parsing  expression  as
339                            well.  This is the ordered choice, aka prioritized
340                            choice.
341
342                     [2]    For parsing expressions e1, e2, ... the result  of
343                            [list  x  e1  e2  ... ] is a parsing expression as
344                            well.  This is the sequence.
345
346                     [3]    For a parsing expression e the result of  [list  *
347                            e]  is  a parsing expression as well.  This is the
348                            kleene closure, describing zero  or  more  repeti‐
349                            tions.
350
351                     [4]    For  a  parsing expression e the result of [list +
352                            e] is a parsing expression as well.  This  is  the
353                            positive  kleene  closure,  describing one or more
354                            repetitions.
355
356                     [5]    For a parsing expression e the result of  [list  &
357                            e]  is  a parsing expression as well.  This is the
358                            and lookahead predicate.
359
360                     [6]    For a parsing expression e the result of  [list  !
361                            e]  is  a parsing expression as well.  This is the
362                            not lookahead predicate.
363
364                     [7]    For a parsing expression e the result of  [list  ?
365                            e]  is  a parsing expression as well.  This is the
366                            optional input.
367
368       Canonical serialization
369              The canonical serialization of a parsing expression has the for‐
370              mat  as  specified  in  the previous item, and then additionally
371              satisfies the constraints below, which make it unique among  all
372              the possible serializations of this parsing expression.
373
374              [1]    The  string  representation of the value is the canonical
375                     representation of a pure Tcl list. I.e. it does not  con‐
376                     tain superfluous whitespace.
377
378              [2]    Terminals  are not encoded as ranges (where start and end
379                     of the range are identical).
380
381   EXAMPLE
382       Assuming the parsing expression shown on the  right-hand  side  of  the
383       rule
384
385                  Expression <- Term (AddOp Term)*
386
387
388       then its canonical serialization (except for whitespace) is
389
390                  {x {n Term} {* {x {n AddOp} {n Term}}}}
391
392

BUGS, IDEAS, FEEDBACK

394       This  document,  and the package it describes, will undoubtedly contain
395       bugs and other problems.  Please report such in the category pt of  the
396       Tcllib  Trackers  [http://core.tcl.tk/tcllib/reportlist].   Please also
397       report any ideas for enhancements  you  may  have  for  either  package
398       and/or documentation.
399
400       When proposing code changes, please provide unified diffs, i.e the out‐
401       put of diff -u.
402
403       Note further that  attachments  are  strongly  preferred  over  inlined
404       patches.  Attachments  can  be  made  by  going to the Edit form of the
405       ticket immediately after its creation, and  then  using  the  left-most
406       button in the secondary navigation bar.
407

KEYWORDS

409       EBNF,  LL(k),  PEG,  TDPL, context-free languages, expression, grammar,
410       matching, parser, parsing expression, parsing expression grammar,  push
411       down  automaton,  recursive descent, state, top-down parsing languages,
412       transducer
413

CATEGORY

415       Parsing and Grammars
416
418       Copyright (c) 2009 Andreas Kupries <andreas_kupries@users.sourceforge.net>
419
420
421
422
423tcllib                               1.0.1                  pt::peg::interp(n)
Impressum