1pt::pe(n)                        Parser Tools                        pt::pe(n)
2
3
4
5______________________________________________________________________________
6

NAME

8       pt::pe - Parsing Expression Serialization
9

SYNOPSIS

11       package require Tcl  8.5
12
13       package require pt::pe  ?1.0.1?
14
15       package require char
16
17       ::pt::pe verify serial ?canonvar?
18
19       ::pt::pe verify-as-canonical serial
20
21       ::pt::pe canonicalize serial
22
23       ::pt::pe print serial
24
25       ::pt::pe bottomup cmdprefix pe
26
27       cmdprefix pe op arguments
28
29       ::pt::pe topdown cmdprefix pe
30
31       ::pt::pe equal seriala serialb
32
33       ::pt::pe epsilon
34
35       ::pt::pe dot
36
37       ::pt::pe alnum
38
39       ::pt::pe alpha
40
41       ::pt::pe ascii
42
43       ::pt::pe control
44
45       ::pt::pe digit
46
47       ::pt::pe graph
48
49       ::pt::pe lower
50
51       ::pt::pe print
52
53       ::pt::pe punct
54
55       ::pt::pe space
56
57       ::pt::pe upper
58
59       ::pt::pe wordchar
60
61       ::pt::pe xdigit
62
63       ::pt::pe ddigit
64
65       ::pt::pe terminal t
66
67       ::pt::pe range ta tb
68
69       ::pt::pe nonterminal nt
70
71       ::pt::pe choice pe...
72
73       ::pt::pe sequence pe...
74
75       ::pt::pe repeat0 pe
76
77       ::pt::pe repeat1 pe
78
79       ::pt::pe optional pe
80
81       ::pt::pe ahead pe
82
83       ::pt::pe notahead pe
84
85______________________________________________________________________________
86

DESCRIPTION

88       Are  you  lost ?  Do you have trouble understanding this document ?  In
89       that case please read the overview  provided  by  the  Introduction  to
90       Parser  Tools.  This document is the entrypoint to the whole system the
91       current package is a part of.
92
93       This package provides commands to work with the serializations of pars‐
94       ing  expressions  as managed by the Parser Tools, and specified in sec‐
95       tion PE serialization format.
96
97       This is a supporting package in the Core Layer of Parser Tools.
98
99       IMAGE: arch_core_support
100

API

102       ::pt::pe verify serial ?canonvar?
103              This command verifies that the content  of  serial  is  a  valid
104              serialization of a parsing expression and will throw an error if
105              that is not the case. The result of the  command  is  the  empty
106              string.
107
108              If  the  argument canonvar is specified it is interpreted as the
109              name of a variable in the calling context. This variable will be
110              written  to  if and only if serial is a valid regular serializa‐
111              tion. Its value will be a boolean, with True indicating that the
112              serialization  is not only valid, but also canonical. False will
113              be written for a valid, but non-canonical serialization.
114
115              For the specification of serializations see the section PE seri‐
116              alization format.
117
118       ::pt::pe verify-as-canonical serial
119              This  command  verifies  that  the  content of serial is a valid
120              canonical serialization of a parsing expression and  will  throw
121              an  error  if that is not the case. The result of the command is
122              the empty string.
123
124              For the specification of canonical serializations see  the  sec‐
125              tion PE serialization format.
126
127       ::pt::pe canonicalize serial
128              This command assumes that the content of serial is a valid regu‐
129              lar serialization of a parsing  expression  and  will  throw  an
130              error if that is not the case.
131
132              It  will then convert the input into the canonical serialization
133              of this parsing expression and return it as its result.  If  the
134              input is already canonical it will be returned unchanged.
135
136              For  the  specification  of regular and canonical serializations
137              see the section PE serialization format.
138
139       ::pt::pe print serial
140              This command assumes that the argument serial contains  a  valid
141              serialization  of a parsing expression and returns a string con‐
142              taining that PE in a human readable form.
143
144              The exact format of this form is not  specified  and  cannot  be
145              relied on for parsing or other machine-based activities.
146
147              For the specification of serializations see the section PE seri‐
148              alization format.
149
150       ::pt::pe bottomup cmdprefix pe
151              This command walks the parsing expression pe from the bottom  up
152              to the root, invoking the command prefix cmdprefix for each par‐
153              tial expression. This implies that the  children  of  a  parsing
154              expression PE are handled before PE.
155
156              The command prefix has the signature
157
158              cmdprefix pe op arguments
159                     I.e.  it  is  invoked  with the parsing expression pe the
160                     walk is currently at, the op'erator in the  pe,  and  the
161                     operator's arguments.
162
163                     The  result returned by the command prefix replaces pe in
164                     the parsing expression it was a child of, allowing trans‐
165                     formations of the expression tree.
166
167                     This  also  means  that for all inner parsing expressions
168                     the contents of arguments are the results of the  command
169                     prefix  invoked  for  the  children of this inner parsing
170                     expression.
171
172       ::pt::pe topdown cmdprefix pe
173              This command walks the parsing expression pe from the root  down
174              to  the  leaves,  invoking the command prefix cmdprefix for each
175              partial expression. This implies that the children of a  parsing
176              expression PE are handled after PE.
177
178              The  command  prefix has the same signature as for bottomup, see
179              above.
180
181              The result returned by the command prefix is ignored.
182
183       ::pt::pe equal seriala serialb
184              This command tests the two parsing expressions seriala and seri‐
185              alb  for  structural  equality.  The  result of the command is a
186              boolean value. It will be set to true  if  the  expressions  are
187              identical, and false otherwise.
188
189              String  equality  is  usable  only if we can assume that the two
190              parsing expressions are pure Tcl lists.
191
192       ::pt::pe epsilon
193              This  command  constructs  the  atomic  parsing  expression  for
194              epsilon.
195
196       ::pt::pe dot
197              This command constructs the atomic parsing expression for dot.
198
199       ::pt::pe alnum
200              This command constructs the atomic parsing expression for alnum.
201
202       ::pt::pe alpha
203              This command constructs the atomic parsing expression for alpha.
204
205       ::pt::pe ascii
206              This command constructs the atomic parsing expression for ascii.
207
208       ::pt::pe control
209              This  command  constructs the atomic parsing expression for con‐
210              trol.
211
212       ::pt::pe digit
213              This command constructs the atomic parsing expression for digit.
214
215       ::pt::pe graph
216              This command constructs the atomic parsing expression for graph.
217
218       ::pt::pe lower
219              This command constructs the atomic parsing expression for lower.
220
221       ::pt::pe print
222              This command constructs the atomic parsing expression for print.
223
224       ::pt::pe punct
225              This command constructs the atomic parsing expression for punct.
226
227       ::pt::pe space
228              This command constructs the atomic parsing expression for space.
229
230       ::pt::pe upper
231              This command constructs the atomic parsing expression for upper.
232
233       ::pt::pe wordchar
234              This command constructs the atomic parsing expression for  word‐
235              char.
236
237       ::pt::pe xdigit
238              This  command  constructs  the  atomic  parsing  expression  for
239              xdigit.
240
241       ::pt::pe ddigit
242              This  command  constructs  the  atomic  parsing  expression  for
243              ddigit.
244
245       ::pt::pe terminal t
246              This  command  constructs  the atomic parsing expression for the
247              terminal symbol t.
248
249       ::pt::pe range ta tb
250              This command constructs the atomic parsing  expression  for  the
251              range of terminal symbols ta ... tb.
252
253       ::pt::pe nonterminal nt
254              This  command  constructs  the atomic parsing expression for the
255              nonterminal symbol nt.
256
257       ::pt::pe choice pe...
258              This command constructs the parsing expression representing  the
259              ordered  or  prioritized  choice  between  the  argument parsing
260              expressions. The first argument has the highest priority.
261
262       ::pt::pe sequence pe...
263              This command constructs the parsing expression representing  the
264              sequence  of the argument parsing expression. The first argument
265              is the first element of the sequence.
266
267       ::pt::pe repeat0 pe
268              This command constructs the parsing expression representing  the
269              zero  or  more repetition of the argument parsing expression pe,
270              also known as the kleene closure.
271
272       ::pt::pe repeat1 pe
273              This command constructs the parsing expression representing  the
274              one  or  more  repetition of the argument parsing expression pe,
275              also known as the positive kleene closure.
276
277       ::pt::pe optional pe
278              This command constructs the parsing expression representing  the
279              optionality of the argument parsing expression pe.
280
281       ::pt::pe ahead pe
282              This  command constructs the parsing expression representing the
283              positive lookahead of the argument parsing expression pe.
284
285       ::pt::pe notahead pe
286              This command constructs the parsing expression representing  the
287              negative lookahead of the argument parsing expression pe.
288

PE SERIALIZATION FORMAT

290       Here  we specify the format used by the Parser Tools to serialize Pars‐
291       ing Expressions as immutable values for transport, comparison, etc.
292
293       We distinguish between regular and canonical serializations.   While  a
294       parsing  expression  may  have more than one regular serialization only
295       exactly one of them will be canonical.
296
297       Regular serialization
298
299              Atomic Parsing Expressions
300
301                     [1]    The string epsilon is an  atomic  parsing  expres‐
302                            sion. It matches the empty string.
303
304                     [2]    The string dot is an atomic parsing expression. It
305                            matches any character.
306
307                     [3]    The string alnum is an atomic parsing  expression.
308                            It  matches  any Unicode alphabet or digit charac‐
309                            ter. This is a custom extension of  PEs  based  on
310                            Tcl's builtin command string is.
311
312                     [4]    The  string alpha is an atomic parsing expression.
313                            It matches any Unicode alphabet character. This is
314                            a  custom  extension of PEs based on Tcl's builtin
315                            command string is.
316
317                     [5]    The string ascii is an atomic parsing  expression.
318                            It matches any Unicode character below U0080. This
319                            is a  custom  extension  of  PEs  based  on  Tcl's
320                            builtin command string is.
321
322                     [6]    The  string  control  is an atomic parsing expres‐
323                            sion. It matches any  Unicode  control  character.
324                            This  is  a custom extension of PEs based on Tcl's
325                            builtin command string is.
326
327                     [7]    The string digit is an atomic parsing  expression.
328                            It  matches any Unicode digit character. Note that
329                            this includes characters  outside  of  the  [0..9]
330                            range.  This is a custom extension of PEs based on
331                            Tcl's builtin command string is.
332
333                     [8]    The string graph is an atomic parsing  expression.
334                            It  matches any Unicode printing character, except
335                            for space. This is a custom extension of PEs based
336                            on Tcl's builtin command string is.
337
338                     [9]    The  string lower is an atomic parsing expression.
339                            It matches any Unicode lower-case alphabet charac‐
340                            ter.  This  is  a custom extension of PEs based on
341                            Tcl's builtin command string is.
342
343                     [10]   The string print is an atomic parsing  expression.
344                            It matches any Unicode printing character, includ‐
345                            ing space. This is a custom extension of PEs based
346                            on Tcl's builtin command string is.
347
348                     [11]   The  string punct is an atomic parsing expression.
349                            It matches any Unicode punctuation character. This
350                            is  a  custom  extension  of  PEs  based  on Tcl's
351                            builtin command string is.
352
353                     [12]   The string space is an atomic parsing  expression.
354                            It  matches any Unicode space character. This is a
355                            custom extension of PEs  based  on  Tcl's  builtin
356                            command string is.
357
358                     [13]   The  string upper is an atomic parsing expression.
359                            It matches any Unicode upper-case alphabet charac‐
360                            ter.  This  is  a custom extension of PEs based on
361                            Tcl's builtin command string is.
362
363                     [14]   The string wordchar is an atomic  parsing  expres‐
364                            sion.  It matches any Unicode word character. This
365                            is any alphanumeric character (see alnum), and any
366                            connector  punctuation  characters  (e.g.   under‐
367                            score). This is a custom extension of PEs based on
368                            Tcl's builtin command string is.
369
370                     [15]   The string xdigit is an atomic parsing expression.
371                            It matches any hexadecimal digit  character.  This
372                            is  a  custom  extension  of  PEs  based  on Tcl's
373                            builtin command string is.
374
375                     [16]   The string ddigit is an atomic parsing expression.
376                            It  matches any decimal digit character. This is a
377                            custom extension of PEs  based  on  Tcl's  builtin
378                            command regexp.
379
380                     [17]   The  expression  [list  t  x] is an atomic parsing
381                            expression. It matches the terminal string x.
382
383                     [18]   The expression [list n A]  is  an  atomic  parsing
384                            expression. It matches the nonterminal A.
385
386              Combined Parsing Expressions
387
388                     [1]    For  parsing expressions e1, e2, ... the result of
389                            [list / e1 e2 ... ] is  a  parsing  expression  as
390                            well.  This is the ordered choice, aka prioritized
391                            choice.
392
393                     [2]    For parsing expressions e1, e2, ... the result  of
394                            [list  x  e1  e2  ... ] is a parsing expression as
395                            well.  This is the sequence.
396
397                     [3]    For a parsing expression e the result of  [list  *
398                            e]  is  a parsing expression as well.  This is the
399                            kleene closure, describing zero  or  more  repeti‐
400                            tions.
401
402                     [4]    For  a  parsing expression e the result of [list +
403                            e] is a parsing expression as well.  This  is  the
404                            positive  kleene  closure,  describing one or more
405                            repetitions.
406
407                     [5]    For a parsing expression e the result of  [list  &
408                            e]  is  a parsing expression as well.  This is the
409                            and lookahead predicate.
410
411                     [6]    For a parsing expression e the result of  [list  !
412                            e]  is  a parsing expression as well.  This is the
413                            not lookahead predicate.
414
415                     [7]    For a parsing expression e the result of  [list  ?
416                            e]  is  a parsing expression as well.  This is the
417                            optional input.
418
419       Canonical serialization
420              The canonical serialization of a parsing expression has the for‐
421              mat  as  specified  in  the previous item, and then additionally
422              satisfies the constraints below, which make it unique among  all
423              the possible serializations of this parsing expression.
424
425              [1]    The  string  representation of the value is the canonical
426                     representation of a pure Tcl list. I.e. it does not  con‐
427                     tain superfluous whitespace.
428
429              [2]    Terminals  are not encoded as ranges (where start and end
430                     of the range are identical).
431
432   EXAMPLE
433       Assuming the parsing expression shown on the  right-hand  side  of  the
434       rule
435
436                  Expression <- Term (AddOp Term)*
437
438
439       then its canonical serialization (except for whitespace) is
440
441                  {x {n Term} {* {x {n AddOp} {n Term}}}}
442
443

BUGS, IDEAS, FEEDBACK

445       This  document,  and the package it describes, will undoubtedly contain
446       bugs and other problems.  Please report such in the category pt of  the
447       Tcllib  Trackers  [http://core.tcl.tk/tcllib/reportlist].   Please also
448       report any ideas for enhancements  you  may  have  for  either  package
449       and/or documentation.
450
451       When proposing code changes, please provide unified diffs, i.e the out‐
452       put of diff -u.
453
454       Note further that  attachments  are  strongly  preferred  over  inlined
455       patches.  Attachments  can  be  made  by  going to the Edit form of the
456       ticket immediately after its creation, and  then  using  the  left-most
457       button in the secondary navigation bar.
458

KEYWORDS

460       EBNF,  LL(k),  PEG,  TDPL, context-free languages, expression, grammar,
461       matching, parser, parsing expression, parsing expression grammar,  push
462       down  automaton,  recursive descent, state, top-down parsing languages,
463       transducer
464

CATEGORY

466       Parsing and Grammars
467
469       Copyright (c) 2009 Andreas Kupries <andreas_kupries@users.sourceforge.net>
470
471
472
473
474tcllib                               1.0.1                           pt::pe(n)
Impressum