1pt::pe::op(n)                    Parser Tools                    pt::pe::op(n)
2
3
4
5______________________________________________________________________________
6

NAME

8       pt::pe::op - Parsing Expression Utilities
9

SYNOPSIS

11       package require Tcl  8.5
12
13       package require pt::pe::op  ?1.0.1?
14
15       package require pt::pe  ?1?
16
17       package require struct::set
18
19       ::pt::pe::op drop dropset pe
20
21       ::pt::pe::op rename nt ntnew pe
22
23       ::pt::pe::op called pe
24
25       ::pt::pe::op flatten pe
26
27       ::pt::pe::op fusechars pe
28
29______________________________________________________________________________
30

DESCRIPTION

32       Are  you  lost ?  Do you have trouble understanding this document ?  In
33       that case please read the overview  provided  by  the  Introduction  to
34       Parser  Tools.  This document is the entrypoint to the whole system the
35       current package is a part of.
36
37       This package provides additional commands to work with  the  serializa‐
38       tions  of  parsing  expressions as managed by the PEG and related pack‐
39       ages, and specified in section PE serialization format.
40
41       This is an internal package, for use by the higher level packages  han‐
42       dling  PEGs, their conversion into and out of various other formats, or
43       other uses.
44

API

46       ::pt::pe::op drop dropset pe
47              This command removes all occurences of any of  the  nonterminals
48              symbols  in  the set dropset from the parsing expression pe, and
49              simplifies it.  This  may  result  in  the  expression  becoming
50              "epsilon", i.e. matching nothing.
51
52       ::pt::pe::op rename nt ntnew pe
53              This command renames all occurences of the nonterminal nt in the
54              parsing expression pe into ntnew.
55
56       ::pt::pe::op called pe
57              This command extracts the set of all nonterminal  symbols  used,
58              i.e. 'called', in the parsing expression pe.
59
60       ::pt::pe::op flatten pe
61              This  command  transforms  the parsing expression by eliminating
62              sequences nested in sequences, and choices in  choices,  lifting
63              the  children  of the nested expression into the parent. It fur‐
64              ther eliminates all sequences and choices with only  one  child,
65              as these are redundant.
66
67              The  resulting  parsing  expression is returned as the result of
68              the command.
69
70       ::pt::pe::op fusechars pe
71              This command transforms the parsing expression by  fusing  adja‐
72              cent terminals in sequences and adjacent terminals and ranges in
73              choices,  it  (re)constructs  highlevel  strings  and  character
74              classes.
75
76              The  resulting  pseudo-parsing  expression  is  returned  as the
77              result of the command and may contain the  pseudo-operators  str
78              for  character  sequences,  aka  strings,  and  cl for character
79              choices, aka character classes.
80
81              The result is called a pseudo-parsing expression because  it  is
82              not  a  true  parsing  expression anymore, and will fail a check
83              with ::pt::peg verify if the new pseudo-operators are present in
84              the  result,  but  is otherwise of sound structure for a parsing
85              expression.   Notably,  the  commands  ::pt::peg  bottomup   and
86              ::pt::peg topdown will process them without trouble.
87

PE SERIALIZATION FORMAT

89       Here  we specify the format used by the Parser Tools to serialize Pars‐
90       ing Expressions as immutable values for transport, comparison, etc.
91
92       We distinguish between regular and canonical serializations.   While  a
93       parsing  expression  may  have more than one regular serialization only
94       exactly one of them will be canonical.
95
96       Regular serialization
97
98              Atomic Parsing Expressions
99
100                     [1]    The string epsilon is an  atomic  parsing  expres‐
101                            sion. It matches the empty string.
102
103                     [2]    The string dot is an atomic parsing expression. It
104                            matches any character.
105
106                     [3]    The string alnum is an atomic parsing  expression.
107                            It  matches  any Unicode alphabet or digit charac‐
108                            ter. This is a custom extension of  PEs  based  on
109                            Tcl's builtin command string is.
110
111                     [4]    The  string alpha is an atomic parsing expression.
112                            It matches any Unicode alphabet character. This is
113                            a  custom  extension of PEs based on Tcl's builtin
114                            command string is.
115
116                     [5]    The string ascii is an atomic parsing  expression.
117                            It matches any Unicode character below U0080. This
118                            is a  custom  extension  of  PEs  based  on  Tcl's
119                            builtin command string is.
120
121                     [6]    The  string  control  is an atomic parsing expres‐
122                            sion. It matches any  Unicode  control  character.
123                            This  is  a custom extension of PEs based on Tcl's
124                            builtin command string is.
125
126                     [7]    The string digit is an atomic parsing  expression.
127                            It  matches any Unicode digit character. Note that
128                            this includes characters  outside  of  the  [0..9]
129                            range.  This is a custom extension of PEs based on
130                            Tcl's builtin command string is.
131
132                     [8]    The string graph is an atomic parsing  expression.
133                            It  matches any Unicode printing character, except
134                            for space. This is a custom extension of PEs based
135                            on Tcl's builtin command string is.
136
137                     [9]    The  string lower is an atomic parsing expression.
138                            It matches any Unicode lower-case alphabet charac‐
139                            ter.  This  is  a custom extension of PEs based on
140                            Tcl's builtin command string is.
141
142                     [10]   The string print is an atomic parsing  expression.
143                            It matches any Unicode printing character, includ‐
144                            ing space. This is a custom extension of PEs based
145                            on Tcl's builtin command string is.
146
147                     [11]   The  string punct is an atomic parsing expression.
148                            It matches any Unicode punctuation character. This
149                            is  a  custom  extension  of  PEs  based  on Tcl's
150                            builtin command string is.
151
152                     [12]   The string space is an atomic parsing  expression.
153                            It  matches any Unicode space character. This is a
154                            custom extension of PEs  based  on  Tcl's  builtin
155                            command string is.
156
157                     [13]   The  string upper is an atomic parsing expression.
158                            It matches any Unicode upper-case alphabet charac‐
159                            ter.  This  is  a custom extension of PEs based on
160                            Tcl's builtin command string is.
161
162                     [14]   The string wordchar is an atomic  parsing  expres‐
163                            sion.  It matches any Unicode word character. This
164                            is any alphanumeric character (see alnum), and any
165                            connector  punctuation  characters  (e.g.   under‐
166                            score). This is a custom extension of PEs based on
167                            Tcl's builtin command string is.
168
169                     [15]   The string xdigit is an atomic parsing expression.
170                            It matches any hexadecimal digit  character.  This
171                            is  a  custom  extension  of  PEs  based  on Tcl's
172                            builtin command string is.
173
174                     [16]   The string ddigit is an atomic parsing expression.
175                            It  matches any decimal digit character. This is a
176                            custom extension of PEs  based  on  Tcl's  builtin
177                            command regexp.
178
179                     [17]   The  expression  [list  t  x] is an atomic parsing
180                            expression. It matches the terminal string x.
181
182                     [18]   The expression [list n A]  is  an  atomic  parsing
183                            expression. It matches the nonterminal A.
184
185              Combined Parsing Expressions
186
187                     [1]    For  parsing expressions e1, e2, ... the result of
188                            [list / e1 e2 ... ] is  a  parsing  expression  as
189                            well.  This is the ordered choice, aka prioritized
190                            choice.
191
192                     [2]    For parsing expressions e1, e2, ... the result  of
193                            [list  x  e1  e2  ... ] is a parsing expression as
194                            well.  This is the sequence.
195
196                     [3]    For a parsing expression e the result of  [list  *
197                            e]  is  a parsing expression as well.  This is the
198                            kleene closure, describing zero  or  more  repeti‐
199                            tions.
200
201                     [4]    For  a  parsing expression e the result of [list +
202                            e] is a parsing expression as well.  This  is  the
203                            positive  kleene  closure,  describing one or more
204                            repetitions.
205
206                     [5]    For a parsing expression e the result of  [list  &
207                            e]  is  a parsing expression as well.  This is the
208                            and lookahead predicate.
209
210                     [6]    For a parsing expression e the result of  [list  !
211                            e]  is  a parsing expression as well.  This is the
212                            not lookahead predicate.
213
214                     [7]    For a parsing expression e the result of  [list  ?
215                            e]  is  a parsing expression as well.  This is the
216                            optional input.
217
218       Canonical serialization
219              The canonical serialization of a parsing expression has the for‐
220              mat  as  specified  in  the previous item, and then additionally
221              satisfies the constraints below, which make it unique among  all
222              the possible serializations of this parsing expression.
223
224              [1]    The  string  representation of the value is the canonical
225                     representation of a pure Tcl list. I.e. it does not  con‐
226                     tain superfluous whitespace.
227
228              [2]    Terminals  are not encoded as ranges (where start and end
229                     of the range are identical).
230
231   EXAMPLE
232       Assuming the parsing expression shown on the  right-hand  side  of  the
233       rule
234
235                  Expression <- Term (AddOp Term)*
236
237
238       then its canonical serialization (except for whitespace) is
239
240                  {x {n Term} {* {x {n AddOp} {n Term}}}}
241
242

BUGS, IDEAS, FEEDBACK

244       This  document,  and the package it describes, will undoubtedly contain
245       bugs and other problems.  Please report such in the category pt of  the
246       Tcllib  Trackers  [http://core.tcl.tk/tcllib/reportlist].   Please also
247       report any ideas for enhancements  you  may  have  for  either  package
248       and/or documentation.
249
250       When proposing code changes, please provide unified diffs, i.e the out‐
251       put of diff -u.
252
253       Note further that  attachments  are  strongly  preferred  over  inlined
254       patches.  Attachments  can  be  made  by  going to the Edit form of the
255       ticket immediately after its creation, and  then  using  the  left-most
256       button in the secondary navigation bar.
257

KEYWORDS

259       EBNF,  LL(k),  PEG,  TDPL, context-free languages, expression, grammar,
260       matching, parser, parsing expression, parsing expression grammar,  push
261       down  automaton,  recursive descent, state, top-down parsing languages,
262       transducer
263

CATEGORY

265       Parsing and Grammars
266
268       Copyright (c) 2009 Andreas Kupries <andreas_kupries@users.sourceforge.net>
269
270
271
272
273tcllib                               1.0.1                       pt::pe::op(n)
Impressum