1`grammar::peg(n) Grammar operations and usage grammar::peg(n)`

2`3`

`4`

`5`

`______________________________________________________________________________`

6

## NAME

8`grammar::peg - Create and manipulate parsing expression grammars`

9

## SYNOPSIS

11`package require Tcl 8.4`

12`13`

`package require snit`

14`15`

`package require grammar::peg ?0.1?`

16`17`

`::grammar::peg pegName ?=|:=|<--|as|deserialize src?`

18`19`

`pegName destroy`

20`21`

`pegName clear`

22`23`

`pegName = srcPEG`

24`25`

`pegName --> dstPEG`

26`27`

`pegName serialize`

28`29`

`pegName deserialize serialization`

30`31`

`pegName is valid`

32`33`

`pegName start ?pe?`

34`35`

`pegName nonterminals`

36`37`

`pegName nonterminal add nt pe`

38`39`

`pegName nonterminal delete nt1 ?nt2 ...?`

40`41`

`pegName nonterminal exists nt`

42`43`

`pegName nonterminal rename nt ntnew`

44`45`

`pegName nonterminal mode nt ?mode?`

46`47`

`pegName nonterminal rule nt`

48`49`

`pegName unknown nonterminals`

50`51`

`______________________________________________________________________________`

52

## DESCRIPTION

54`This package provides a container class for parsing expression grammars`

55`(Short: PEG). It allows the incremental definition of the grammar, its`

56`manipulation and querying of the definition. The package neither pro‐`

57`vides complex operations on the grammar, nor has it the ability to exe‐`

58`cute a grammar definition for a stream of symbols. Two packages`

59`related to this one are grammar::mengine and grammar::peg::interpreter.`

60`The first of them defines a general virtual machine for the matching of`

61`a character stream, and the second implements an interpreter for pars‐`

62`ing expression grammars on top of that virtual machine.`

63`64`

`TERMS & CONCEPTS`

65`PEGs are similar to context-free grammars, but not equivalent; in some`

66`cases PEGs are strictly more powerful than context-free grammars (there`

67`exist PEGs for some non-context-free languages). The formal mathemati‐`

68`cal definition of parsing expressions and parsing expression grammars`

69`can be found in section PARSING EXPRESSION GRAMMARS.`

70`71`

`In short, we have terminal symbols, which are the most basic building`

72`blocks for sentences, and nonterminal symbols with associated parsing`

73`expressions, defining the grammatical structure of the sentences. The`

74`two sets of symbols are distinctive, and do not overlap. When speaking`

75`about symbols the word "symbol" is often left out. The union of the`

76`sets of terminal and nonterminal symbols is called the set of symbols.`

77`78`

`Here the set of terminal symbols is not explicitly managed, but implic‐`

79`itly defined as the set of all characters. Note that this means that we`

80`inherit from Tcl the ability to handle all of Unicode.`

81`82`

`A pair of nonterminal and parsing expression is also called a grammati‐`

83`cal rule, or rule for short. In the context of a rule the nonterminal`

84`is often called the left-hand-side (LHS), and the parsing expression`

85`the right-hand-side (RHS).`

86`87`

`The start expression of a grammar is a parsing expression from which`

88`all the sentences contained in the language specified by the grammar`

89`are derived. To make the understanding of this term easier let us`

90`assume for a moment that the RHS of each rule, and the start expres‐`

91`sion, is either a sequence of symbols, or a series of alternate parsing`

92`expressions. In the latter case the rule can be seen as a set of`

93`rules, each providing one alternative for the nonterminal. A parsing`

94`expression A' is now a derivation of a parsing expression A if we pick`

95`one of the nonterminals N in the expression, and one of the alternative`

96`rules R for N, and then replace the nonterminal in A with the RHS of`

97`the chosen rule. Here we can see why the terminal symbols are called`

98`such. They cannot be expanded any further, thus terminate the process`

99`of deriving new expressions. An example`

100`101`

`Rules`

102`(1) A <- a B c`

103`(2a) B <- d B`

104`(2b) B <- e`

105`106`

`Some derivations, using starting expression A.`

107`108`

`A -/1/-> a B c -/2a/-> a d B c -/2b/-> a d e c`

109`110`

`111`

`A derived expression containing only terminal symbols is a sentence.`

112`The set of all sentences which can be derived from the start expression`

113`is the language of the grammar.`

114`115`

`Some definitions for nonterminals and expressions:`

116`117`

`[1] A nonterminal A is called reachable if it is possible to derive`

118`a parsing expression from the start expression which contains A.`

119`120`

`[2] A nonterminal A is called useful if it is possible to derive a`

121`sentence from it.`

122`123`

`[3] A nonterminal A is called recursive if it is possible to derive`

124`a parsing expression from it which contains A, again.`

125`126`

`[4] The FIRST set of a nonterminal A contains all the symbols which`

127`can occur of as the leftmost symbol in a parsing expression`

128`derived from A. If the FIRST set contains A itself then that`

129`nonterminal is called left-recursive.`

130`131`

`[5] The LAST set of a nonterminal A contains all the symbols which`

132`can occur of as the rightmost symbol in a parsing expression`

133`derived from A. If the LAST set contains A itself then that non‐`

134`terminal is called right-recursive.`

135`136`

`[6] The FOLLOW set of a nonterminal A contains all the symbols which`

137`can occur after A in a parsing expression derived from the start`

138`expression.`

139`140`

`[7] A nonterminal (or parsing expression) is called nullable if the`

141`empty sentence can be derived from it.`

142`143`

`And based on the above definitions for grammars:`

144`145`

`[1] A grammar G is recursive if and only if it contains a nontermi‐`

146`nal A which is recursive. The terms left- and right-recursive,`

147`and useful are analogously defined.`

148`149`

`[2] A grammar is minimal if it contains only reachable and useful`

150`nonterminals.`

151`152`

`[3] A grammar is wellformed if it is not left-recursive. Such gram‐`

153`mars are also complete, which means that they always succeed or`

154`fail on all input sentences. For an incomplete grammar on the`

155`other hand input sentences exist for which an attempt to match`

156`them against the grammar will not terminate.`

157`158`

`[4] As we wish to allow ourselves to build a grammar incrementally`

159`in a container object we will encounter stages where the RHS of`

160`one or more rules reference symbols which are not yet known to`

161`the container. Such a grammar we call invalid. We cannot use`

162`the term incomplete as this term is already taken, see the last`

163`item.`

164`165`

`CONTAINER CLASS API`

166`The package exports the API described here.`

167`168`

`::grammar::peg pegName ?=|:=|<--|as|deserialize src?`

169`The command creates a new container object for a parsing expres‐`

170`sion grammar and returns the fully qualified name of the object`

171`command as its result. The API the returned command is following`

172`is described in the section CONTAINER OBJECT API. It may be used`

173`to invoke various operations on the container and the grammar`

174`within.`

175`176`

`The new container, i.e. grammar will be empty if no src is spec‐`

177`ified. Otherwise it will contain a copy of the grammar contained`

178`in the src. The src has to be a container object reference for`

179`all operators except deserialize. The deserialize operator`

180`requires src to be the serialization of a parsing expression`

181`grammar instead.`

182`183`

`An empty grammar has no nonterminal symbols, and the start`

184`expression is the empty expression, i.e. epsilon. It is valid,`

185`but not useful.`

186`187`

`CONTAINER OBJECT API`

188`All grammar container objects provide the following methods for the`

189`manipulation of their contents:`

190`191`

`pegName destroy`

192`Destroys the grammar, including its storage space and associated`

193`command.`

194`195`

`pegName clear`

196`Clears out the definition of the grammar contained in pegName,`

197`but does not destroy the object.`

198`199`

`pegName = srcPEG`

200`Assigns the contents of the grammar contained in srcPEG to peg‐`

201`Name, overwriting any existing definition. This is the assign‐`

202`ment operator for grammars. It copies the grammar contained in`

203`the grammar object srcPEG over the grammar definition in peg‐`

204`Name. The old contents of pegName are deleted by this operation.`

205`206`

`This operation is in effect equivalent to`

207`208`

`209`

`210`

`pegName deserialize [srcPEG serialize]`

211`212`

`213`

`pegName --> dstPEG`

214`This is the reverse assignment operator for grammars. It copies`

215`the automation contained in the object pegName over the grammar`

216`definition in the object dstPEG. The old contents of dstPEG are`

217`deleted by this operation.`

218`219`

`This operation is in effect equivalent to`

220`221`

`222`

`223`

`dstPEG deserialize [pegName serialize]`

224`225`

`226`

`pegName serialize`

227`This method serializes the grammar stored in pegName. In other`

228`words it returns a tcl value completely describing that grammar.`

229`This allows, for example, the transfer of grammars over arbi‐`

230`trary channels, persistence, etc. This method is also the basis`

231`for both the copy constructor and the assignment operator.`

232`233`

`The result of this method has to be semantically identical over`

234`all implementations of the grammar::peg interface. This is what`

235`will enable us to copy grammars between different implementa‐`

236`tions of the same interface.`

237`238`

`The result is a list of four elements with the following struc‐`

239`ture:`

240`241`

`[1] The constant string grammar::peg.`

242`243`

`[2] A dictionary. Its keys are the names of all known nonter‐`

244`minal symbols, and their associated values are the pars‐`

245`ing expressions describing their sentennial structure.`

246`247`

`[3] A dictionary. Its keys are the names of all known nonter‐`

248`minal symbols, and their associated values hints to a`

249`matcher regarding the semantic values produced by the`

250`symbol.`

251`252`

`[4] The last item is a parsing expression, the start expres‐`

253`sion of the grammar.`

254`255`

`Assuming the following PEG for simple mathematical expressions`

256`257`

`258`

`259`

`Digit <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9'`

260`Sign <- '+' / '-'`

261`Number <- Sign? Digit+`

262`Expression <- '(' Expression ')' / (Factor (MulOp Factor)*)`

263`MulOp <- '*' / '/'`

264`Factor <- Term (AddOp Term)*`

265`AddOp <- '+'/'-'`

266`Term <- Number`

267`268`

`269`

`a possible serialization is`

270`271`

`272`

`273`

`grammar::peg \\`

274`{Expression {/ {x ( Expression )} {x Factor {* {x MulOp Factor}}}} \\`

275`Factor {x Term {* {x AddOp Term}}} \\`

276`Term Number \\`

277`MulOp {/ * /} \\`

278`AddOp {/ + -} \\`

279`Number {x {? Sign} {+ Digit}} \\`

280`Sign {/ + -} \\`

281`Digit {/ 0 1 2 3 4 5 6 7 8 9} \\`

282`} \\`

283`{Expression value Factor value \\`

284`Term value MulOp value \\`

285`AddOp value Number value \\`

286`Sign value Digit value \\`

287`}`

288`Expression`

289`290`

`291`

`A possible one, because the order of the nonterminals in the dictionary`

292`is not relevant.`

293`294`

`pegName deserialize serialization`

295`This is the complement to serialize. It replaces the grammar`

296`definition in pegName with the grammar described by the serial‐`

297`ization value. The old contents of pegName are deleted by this`

298`operation.`

299`300`

`pegName is valid`

301`A predicate. It tests whether the PEG in pegName is valid. See`

302`section TERMS & CONCEPTS for the definition of this grammar`

303`property. The result is a boolean value. It will be set to true`

304`if the PEG has the tested property, and false otherwise.`

305`306`

`pegName start ?pe?`

307`This method defines the start expression of the grammar. It`

308`replaces the previously defined start expression with the pars‐`

309`ing expression pe. The method fails and throws an error if pe`

310`does not contain a valid parsing expression as specified in the`

311`section PARSING EXPRESSIONS. In that case the existing start`

312`expression is not changed. The method returns the empty string`

313`as its result.`

314`315`

`If the method is called without an argument it will return the`

316`currently defined start expression.`

317`318`

`pegName nonterminals`

319`Returns the set of all nonterminal symbols known to the grammar.`

320`321`

`pegName nonterminal add nt pe`

322`This method adds the nonterminal nt and its associated parsing`

323`expression pe to the set of nonterminal symbols and rules of the`

324`PEG contained in the object pegName. The method fails and`

325`throws an error if either the string nt is already known as a`

326`symbol of the grammar, or if pe does not contain a valid parsing`

327`expression as specified in the section PARSING EXPRESSIONS. In`

328`that case the current set of nonterminal symbols and rules is`

329`not changed. The method returns the empty string as its result.`

330`331`

`pegName nonterminal delete nt1 ?nt2 ...?`

332`This method removes the named symbols nt1, nt2 from the set of`

333`nonterminal symbols of the PEG contained in the object pegName.`

334`The method fails and throws an error if any of the strings is`

335`not known as a nonterminal symbol. In that case the current set`

336`of nonterminal symbols is not changed. The method returns the`

337`empty string as its result.`

338`339`

`The stored grammar becomes invalid if the deleted nonterminals`

340`are referenced by the RHS of still-known rules.`

341`342`

`pegName nonterminal exists nt`

343`A predicate. It tests whether the nonterminal symbol nt is known`

344`to the PEG in pegName. The result is a boolean value. It will`

345`be set to true if the symbol nt is known, and false otherwise.`

346`347`

`pegName nonterminal rename nt ntnew`

348`This method renames the nonterminal symbol nt to ntnew. The`

349`method fails and throws an error if either nt is not known as a`

350`nonterminal, or if ntnew is a known symbol. The method returns`

351`the empty string as its result.`

352`353`

`pegName nonterminal mode nt ?mode?`

354`This mode returns or sets the semantic mode associated with the`

355`nonterminal symbol nt. If no mode is specified the current mode`

356`of the nonterminal is returned. Otherwise the current mode is`

357`set to mode. The method fails and throws an error if nt is not`

358`known as a nonterminal. The grammar interpreter implemented by`

359`the package grammar::peg::interpreter recognizes the following`

360`modes:`

361`362`

`value The semantic value of the nonterminal is the abstract`

363`syntax tree created from the AST's of the RHS and a node`

364`for the nonterminal itself.`

365`366`

`match The semantic value of the nonterminal is an the abstract`

367`syntax tree consisting of single a node for the string`

368`matched by the RHS. The ASTs generated by the RHS are`

369`discarded.`

370`371`

`leaf The semantic value of the nonterminal is an the abstract`

372`syntax tree consisting of single a node for the nontermi‐`

373`nal itself. The ASTs generated by the RHS are discarded.`

374`375`

`discard`

376`The nonterminal has no semantic value. The ASTs generated`

377`by the RHS are discarded (as well).`

378`379`

`pegName nonterminal rule nt`

380`This method returns the parsing expression associated with the`

381`nonterminal nt. The method fails and throws an error if nt is`

382`not known as a nonterminal.`

383`384`

`pegName unknown nonterminals`

385`This method returns a list containing the names of all nontermi‐`

386`nal symbols which are referenced on the RHS of a grammatical`

387`rule, but have no rule definining their structure. In other`

388`words, a list of the nonterminal symbols which make the grammar`

389`invalid. The grammar is valid if this list is empty.`

390`391`

`PARSING EXPRESSIONS`

392`Various methods of PEG container objects expect a parsing expression as`

393`their argument, or will return such. This section specifies the format`

394`such parsing expressions are in.`

395`396`

`[1] The string epsilon is an atomic parsing expression. It matches`

397`the empty string.`

398`399`

`[2] The string alnum is an atomic parsing expression. It matches any`

400`alphanumeric character.`

401`402`

`[3] The string alpha is an atomic parsing expression. It matches any`

403`alphabetical character.`

404`405`

`[4] The string dot is an atomic parsing expression. It matches any`

406`character.`

407`408`

`[5] The expression [list t x] is an atomic parsing expression. It`

409`matches the terminal string x.`

410`411`

`[6] The expression [list n A] is an atomic parsing expression. It`

412`matches the nonterminal A.`

413`414`

`[7] For parsing expressions e1, e2, ... the result of [list / e1 e2`

415`... ] is a parsing expression as well. This is the ordered`

416`choice, aka prioritized choice.`

417`418`

`[8] For parsing expressions e1, e2, ... the result of [list x e1 e2`

419`... ] is a parsing expression as well. This is the sequence.`

420`421`

`[9] For a parsing expression e the result of [list * e] is a parsing`

422`expression as well. This is the kleene closure, describing zero`

423`or more repetitions.`

424`425`

`[10] For a parsing expression e the result of [list + e] is a parsing`

426`expression as well. This is the positive kleene closure,`

427`describing one or more repetitions.`

428`429`

`[11] For a parsing expression e the result of [list & e] is a parsing`

430`expression as well. This is the and lookahead predicate.`

431`432`

`[12] For a parsing expression e the result of [list ! e] is a parsing`

433`expression as well. This is the not lookahead predicate.`

434`435`

`[13] For a parsing expression e the result of [list ? e] is a parsing`

436`expression as well. This is the optional input.`

437`438`

`Examples of parsing expressions where already shown, in the description`

439`of the method serialize.`

440

## PARSING EXPRESSION GRAMMARS

442`For the mathematically inclined, a PEG is a 4-tuple (VN,VT,R,eS) where`

443`444`

`· VN is a set of nonterminal symbols,`

445`446`

`· VT is a set of terminal symbols,`

447`448`

`· R is a finite set of rules, where each rule is a pair (A,e), A`

449`in VN, and e a parsing expression.`

450`451`

`· eS is a parsing expression, the start expression.`

452`453`

`Further constraints are`

454`455`

`· The intersection of VN and VT is empty.`

456`457`

`· For all A in VT exists exactly one pair (A,e) in R. In other`

458`words, R is a function from nonterminal symbols to parsing`

459`expressions.`

460`461`

`Parsing expression are inductively defined via`

462`463`

`· The empty string (epsilon) is a parsing expression.`

464`465`

`· A terminal symbol a is a parsing expression.`

466`467`

`· A nonterminal symbol A is a parsing expression.`

468`469`

`· e1e2 is a parsing expression for parsing expressions e1 and 2.`

470`This is called sequence.`

471`472`

`· e1/e2 is a parsing expression for parsing expressions e1 and 2.`

473`This is called ordered choice.`

474`475`

`· e* is a parsing expression for parsing expression e. This is`

476`called zero-or-more repetitions, also known as kleene closure.`

477`478`

`· e+ is a parsing expression for parsing expression e. This is`

479`called one-or-more repetitions, also known as positive kleene`

480`closure.`

481`482`

`· !e is a parsing expression for parsing expression e1. This is`

483`called a not lookahead predicate.`

484`485`

`· &e is a parsing expression for parsing expression e1. This is`

486`called an and lookahead predicate.`

487`488`

`PEGs are used to define a grammatical structure for streams of symbols`

489`over VT. They are a modern phrasing of older formalisms invented by`

490`Alexander Birham. These formalisms were called TS (TMG recognition`

491`scheme), and gTS (generalized TS). Later they were renamed to TPDL`

492`(Top-Down Parsing Languages) and gTPDL (generalized TPDL).`

493`494`

`They can be easily implemented by recursive descent parsers with back‐`

495`tracking. This makes them relatives of LL(k) Context-Free Grammars.`

496

## REFERENCES

498`[1] The Packrat Parsing and Parsing Expression Grammars Page`

499`[http://www.pdos.lcs.mit.edu/~baford/packrat/], by Bryan Ford,`

500`Massachusetts Institute of Technology. This is the main entry`

501`page to PEGs, and their realization through Packrat Parsers.`

502`503`

`[2] Parsing Techniques - A Practical Guide`

504`[http://www.cs.vu.nl/~dick/PTAPG.html], an online book offering`

505`a clear, accessible, and thorough discussion of many different`

506`parsing techniques with their interrelations and applicabili‐`

507`ties, including error recovery techniques.`

508`509`

`[3] Compilers and Compiler Generators [http://scifac.ru.ac.za/com‐`

510`pilers/], an online book using CoCo/R, a generator for recursive`

511`descent parsers.`

512

## BUGS, IDEAS, FEEDBACK

514`This document, and the package it describes, will undoubtedly contain`

515`bugs and other problems. Please report such in the category gram‐`

516`mar_peg of the Tcllib Trackers [http://core.tcl.tk/tcllib/reportlist].`

517`Please also report any ideas for enhancements you may have for either`

518`package and/or documentation.`

519`520`

`When proposing code changes, please provide unified diffs, i.e the out‐`

521`put of diff -u.`

522`523`

`Note further that attachments are strongly preferred over inlined`

524`patches. Attachments can be made by going to the Edit form of the`

525`ticket immediately after its creation, and then using the left-most`

526`button in the secondary navigation bar.`

527

## KEYWORDS

529`LL(k), TDPL, context-free languages, expression, grammar, parsing,`

530`parsing expression, parsing expression grammar, push down automaton,`

531`recursive descent, state, top-down parsing languages, transducer`

532

## CATEGORY

534`Grammars and finite automata`

535

## COPYRIGHT

537`Copyright (c) 2005 Andreas Kupries <andreas_kupries@users.sourceforge.net>`

538`539`

`540`

`541`

`542`

`tcllib 0.1 grammar::peg(n)`