1pt::pegrammar(n)                 Parser Tools                 pt::pegrammar(n)
2
3
4
5______________________________________________________________________________
6

NAME

8       pt::pegrammar - Introduction to Parsing Expression Grammars
9

SYNOPSIS

11       package require Tcl  8.5
12
13______________________________________________________________________________
14

DESCRIPTION

16       Are  you  lost ?  Do you have trouble understanding this document ?  In
17       that case please read the overview  provided  by  the  Introduction  to
18       Parser  Tools.  This document is the entrypoint to the whole system the
19       current package is a part of.
20
21       Welcome to the introduction  to  Parsing  Expression  Grammars  (short:
22       PEG),  the  formalism used by the Parser Tools.  It is assumed that the
23       reader has a basic knowledge of parsing theory, i.e. Context-Free Gram‐
24       mars  (short:  CFG), languages, and associated terms like LL(k), LR(k),
25       terminal and nonterminal symbols, etc.  We do not intend  to  recapitu‐
26       late   such   basic   definitions  or  terms  like  useful,  reachable,
27       (left/right) recursive, nullable, first/last/follow sets, etc.   Please
28       see  the References at the end instead if you are in need of places and
29       books which provide such background information.
30
31       PEGs are formally very similar to CFGs, with terminal  and  nonterminal
32       symbols, start symbol, and rules defining the structure of each nonter‐
33       minal symbol.  The main difference lies in the choice(sic!)  of  choice
34       operators. Where CFGs use an unordered choice to represent alternatives
35       PEGs use prioritized choice. Which is fancy way of saying that a parser
36       has  to  try the first alternative first and can try the other alterna‐
37       tives if only if it fails for the first, and so on.
38
39       On the CFG side this gives rise to  LL(k)  and  LR(k)  for  making  the
40       choice  deterministic  with  a bounded lookahead of k terminal symbols,
41       where LL is in essence topdown aka recursive descent  parsing,  and  LR
42       bottomup aka shift reduce parsing.
43
44       On  the  PEG  side  we can parse input with recursive descent and back‐
45       tracking of failed choices, the latter of which  amounts  to  unlimited
46       lookahead.  By additionally recording the success or failure of nonter‐
47       minals at the specific locations they were tried at  and  reusing  this
48       information  after  backtracking we can avoid the exponential blowup of
49       running time usually associated with backtracking and keep the  parsing
50       linear. The memory requirements are of course higher due to this cache,
51       as we are trading space for time.
52
53       This is the basic concept behind packrat parsers.
54
55       A limitation pure PEGs share with LL(k)  CFGs  is  that  left-recursive
56       grammars cannot be parsed, with the associated recursive descent parser
57       entering an infinite recursion.  This limitation is usually overcome by
58       extending pure PEGs with explicit operators to specify repetition, zero
59       or more, and one or more, or, formally spoken, for the  kleene  closure
60       and positive kleene closure.  This is what the Parser Tools are doing.
61
62       Another  extension,  specific  to  Parser  Tools, is a set of operators
63       which map more or less directly to various character classes built into
64       Tcl, i.e. the classes reachable via string is.
65
66       The  remainder  of  this  document consists of the formal definition of
67       PEGs for the mathematically inclined, and an  appendix  listing  refer‐
68       ences to places with more information on PEGs specifically, and parsing
69       in general.
70

FORMAL DEFINITION

72       For the mathematically inclined, a  Parsing  Expression  Grammar  is  a
73       4-tuple (VN,VT,R,eS) where
74
75       ·      VN is a set of nonterminal symbols,
76
77       ·      VT is a set of terminal symbols,
78
79       ·      R  is  a finite set of rules, where each rule is a pair (A,e), A
80              in VN, and e a parsing expression.
81
82       ·      eS is a parsing expression, the start expression.
83
84       Further constraints are
85
86       ·      The intersection of VN and VT is empty.
87
88       ·      For all A in VT exists exactly one pair (A,e)  in  R.  In  other
89              words,  R  is  a  function  from  nonterminal symbols to parsing
90              expressions.
91
92       Parsing expressions are inductively defined via
93
94       ·      The empty string (epsilon) is a parsing expression.
95
96       ·      A terminal symbol a is a parsing expression.
97
98       ·      A nonterminal symbol A is a parsing expression.
99
100       ·      e1e2 is a parsing expression for parsing expressions e1  and  2.
101              This is called sequence.
102
103       ·      e1/e2  is a parsing expression for parsing expressions e1 and 2.
104              This is called ordered choice.
105
106       ·      e* is a parsing expression for parsing  expression  e.  This  is
107              called zero-or-more repetitions, also known as kleene closure.
108
109       ·      e+  is  a  parsing  expression for parsing expression e. This is
110              called one-or-more repetitions, also known  as  positive  kleene
111              closure.
112
113       ·      !e  is  a  parsing expression for parsing expression e1. This is
114              called a not lookahead predicate.
115
116       ·      &e is a parsing expression for parsing expression  e1.  This  is
117              called an and lookahead predicate.
118
119       PEGs  are used to define a grammatical structure for streams of symbols
120       over VT. They are a modern phrasing of  older  formalisms  invented  by
121       Alexander  Birham.  These  formalisms  were  called TS (TMG recognition
122       scheme), and gTS (generalized TS). Later  they  were  renamed  to  TPDL
123       (Top-Down Parsing Languages) and gTPDL (generalized TPDL).
124
125       They  can be easily implemented by recursive descent parsers with back‐
126       tracking. This makes them relatives of LL(k) Context-Free Grammars.
127

REFERENCES

129       [1]    The  Packrat  Parsing  and  Parsing  Expression  Grammars   Page
130              [http://www.pdos.lcs.mit.edu/~baford/packrat/],  by  Bryan Ford,
131              Massachusetts Institute of Technology. This is  the  main  entry
132              page to PEGs, and their realization through Packrat Parsers.
133
134       [2]    http://en.wikipedia.org/wiki/Parsing_expression_grammar
135              Wikipedia's entry about Parsing Expression Grammars.
136
137       [3]    Parsing      Techniques      -      A      Practical       Guide
138              [http://www.cs.vu.nl/~dick/PTAPG.html],  an online book offering
139              a clear, accessible, and thorough discussion of  many  different
140              parsing  techniques  with  their interrelations and applicabili‐
141              ties, including error recovery techniques.
142
143       [4]    Compilers and Compiler  Generators  [http://scifac.ru.ac.za/com
144              pilers/], an online book using CoCo/R, a generator for recursive
145              descent parsers.
146

BUGS, IDEAS, FEEDBACK

148       This document, and the package it describes, will  undoubtedly  contain
149       bugs  and other problems.  Please report such in the category pt of the
150       Tcllib Trackers  [http://core.tcl.tk/tcllib/reportlist].   Please  also
151       report  any  ideas  for  enhancements  you  may have for either package
152       and/or documentation.
153
154       When proposing code changes, please provide unified diffs, i.e the out‐
155       put of diff -u.
156
157       Note  further  that  attachments  are  strongly  preferred over inlined
158       patches. Attachments can be made by going  to  the  Edit  form  of  the
159       ticket  immediately  after  its  creation, and then using the left-most
160       button in the secondary navigation bar.
161

KEYWORDS

163       EBNF, LL(k), PEG, TDPL, context-free  languages,  expression,  grammar,
164       matching,  parser, parsing expression, parsing expression grammar, push
165       down automaton, recursive descent, state, top-down  parsing  languages,
166       transducer
167

CATEGORY

169       Parsing and Grammars
170
172       Copyright (c) 2009 Andreas Kupries <andreas_kupries@users.sourceforge.net>
173
174
175
176
177tcllib                                 1                      pt::pegrammar(n)
Impressum