1pt::peg::to::peg(n) Parser Tools pt::peg::to::peg(n)
2
3
4
5______________________________________________________________________________
6
8 pt::peg::to::peg - PEG Conversion. Write PEG format
9
11 package require Tcl 8.5
12
13 package require pt::peg::to::peg ?1.0.2?
14
15 package require pt::peg
16
17 package require pt::pe
18
19 package require text::write
20
21 pt::peg::to::peg reset
22
23 pt::peg::to::peg configure
24
25 pt::peg::to::peg configure option
26
27 pt::peg::to::peg configure option value...
28
29 pt::peg::to::peg convert serial
30
31______________________________________________________________________________
32
34 Are you lost ? Do you have trouble understanding this document ? In
35 that case please read the overview provided by the Introduction to
36 Parser Tools. This document is the entrypoint to the whole system the
37 current package is a part of.
38
39 This package implements the converter from parsing expression grammars
40 to PEG markup.
41
42 It resides in the Export section of the Core Layer of Parser Tools, and
43 can be used either directly with the other packages of this layer, or
44 indirectly through the export manager provided by pt::peg::export. The
45 latter is intented for use in untrusted environments and done through
46 the corresponding export plugin pt::peg::export::peg sitting between
47 converter and export manager.
48
49 IMAGE: arch_core_eplugins
50
52 The API provided by this package satisfies the specification of the
53 Converter API found in the Parser Tools Export API specification.
54
55 pt::peg::to::peg reset
56 This command resets the configuration of the package to its de‐
57 fault settings.
58
59 pt::peg::to::peg configure
60 This command returns a dictionary containing the current config‐
61 uration of the package.
62
63 pt::peg::to::peg configure option
64 This command returns the current value of the specified configu‐
65 ration option of the package. For the set of legal options,
66 please read the section Options.
67
68 pt::peg::to::peg configure option value...
69 This command sets the given configuration options of the pack‐
70 age, to the specified values. For the set of legal options,
71 please read the section Options.
72
73 pt::peg::to::peg convert serial
74 This command takes the canonical serialization of a parsing ex‐
75 pression grammar, as specified in section PEG serialization for‐
76 mat, and contained in serial, and generates PEG markup encoding
77 the grammar, per the current package configuration. The created
78 string is then returned as the result of the command.
79
81 The converter to the PEG language recognizes the following options and
82 changes its behaviour as they specify.
83
84 -file string
85 The value of this option is the name of the file or other entity
86 from which the grammar came, for which the command is run. The
87 default value is unknown.
88
89 -name string
90 The value of this option is the name of the grammar we are pro‐
91 cessing. The default value is a_pe_grammar.
92
93 -user string
94 The value of this option is the name of the user for which the
95 command is run. The default value is unknown.
96
97 -template string
98 The value of this option is a string into which to put the gen‐
99 erated text and the values of the other options. The various lo‐
100 cations for user-data are expected to be specified with the
101 placeholders listed below. The default value is "@code@".
102
103 @user@ To be replaced with the value of the option -user.
104
105 @format@
106 To be replaced with the the constant PEG.
107
108 @file@ To be replaced with the value of the option -file.
109
110 @name@ To be replaced with the value of the option -name.
111
112 @code@ To be replaced with the generated text.
113
115 peg, a language for the specification of parsing expression grammars is
116 meant to be human readable, and writable as well, yet strict enough to
117 allow its processing by machine. Like any computer language. It was de‐
118 fined to make writing the specification of a grammar easy, something
119 the other formats found in the Parser Tools do not lend themselves too.
120
121 It is formally specified by the grammar shown below, written in itself.
122 For a tutorial / introduction to the language please go and read the
123 PEG Language Tutorial.
124
125 PEG pe-grammar-for-peg (Grammar)
126
127 # --------------------------------------------------------------------
128 # Syntactical constructs
129
130 Grammar <- WHITESPACE Header Definition* Final EOF ;
131
132 Header <- PEG Identifier StartExpr ;
133 Definition <- Attribute? Identifier IS Expression SEMICOLON ;
134 Attribute <- (VOID / LEAF) COLON ;
135 Expression <- Sequence (SLASH Sequence)* ;
136 Sequence <- Prefix+ ;
137 Prefix <- (AND / NOT)? Suffix ;
138 Suffix <- Primary (QUESTION / STAR / PLUS)? ;
139 Primary <- ALNUM / ALPHA / ASCII / CONTROL / DDIGIT / DIGIT
140 / GRAPH / LOWER / PRINTABLE / PUNCT / SPACE / UPPER
141 / WORDCHAR / XDIGIT
142 / Identifier
143 / OPEN Expression CLOSE
144 / Literal
145 / Class
146 / DOT
147 ;
148 Literal <- APOSTROPH (!APOSTROPH Char)* APOSTROPH WHITESPACE
149 / DAPOSTROPH (!DAPOSTROPH Char)* DAPOSTROPH WHITESPACE ;
150 Class <- OPENB (!CLOSEB Range)* CLOSEB WHITESPACE ;
151 Range <- Char TO Char / Char ;
152
153 StartExpr <- OPEN Expression CLOSE ;
154 void: Final <- "END" WHITESPACE SEMICOLON WHITESPACE ;
155
156 # --------------------------------------------------------------------
157 # Lexing constructs
158
159 Identifier <- Ident WHITESPACE ;
160 leaf: Ident <- ([_:] / <alpha>) ([_:] / <alnum>)* ;
161 Char <- CharSpecial / CharOctalFull / CharOctalPart
162 / CharUnicode / CharUnescaped
163 ;
164
165 leaf: CharSpecial <- "\\" [nrt'"\[\]\\] ;
166 leaf: CharOctalFull <- "\\" [0-2][0-7][0-7] ;
167 leaf: CharOctalPart <- "\\" [0-7][0-7]? ;
168 leaf: CharUnicode <- "\\" 'u' HexDigit (HexDigit (HexDigit HexDigit?)?)? ;
169 leaf: CharUnescaped <- !"\\" . ;
170
171 void: HexDigit <- [0-9a-fA-F] ;
172
173 void: TO <- '-' ;
174 void: OPENB <- "[" ;
175 void: CLOSEB <- "]" ;
176 void: APOSTROPH <- "'" ;
177 void: DAPOSTROPH <- '"' ;
178 void: PEG <- "PEG" !([_:] / <alnum>) WHITESPACE ;
179 void: IS <- "<-" WHITESPACE ;
180 leaf: VOID <- "void" WHITESPACE ; # Implies that definition has no semantic value.
181 leaf: LEAF <- "leaf" WHITESPACE ; # Implies that definition has no terminals.
182 void: SEMICOLON <- ";" WHITESPACE ;
183 void: COLON <- ":" WHITESPACE ;
184 void: SLASH <- "/" WHITESPACE ;
185 leaf: AND <- "&" WHITESPACE ;
186 leaf: NOT <- "!" WHITESPACE ;
187 leaf: QUESTION <- "?" WHITESPACE ;
188 leaf: STAR <- "*" WHITESPACE ;
189 leaf: PLUS <- "+" WHITESPACE ;
190 void: OPEN <- "(" WHITESPACE ;
191 void: CLOSE <- ")" WHITESPACE ;
192 leaf: DOT <- "." WHITESPACE ;
193
194 leaf: ALNUM <- "<alnum>" WHITESPACE ;
195 leaf: ALPHA <- "<alpha>" WHITESPACE ;
196 leaf: ASCII <- "<ascii>" WHITESPACE ;
197 leaf: CONTROL <- "<control>" WHITESPACE ;
198 leaf: DDIGIT <- "<ddigit>" WHITESPACE ;
199 leaf: DIGIT <- "<digit>" WHITESPACE ;
200 leaf: GRAPH <- "<graph>" WHITESPACE ;
201 leaf: LOWER <- "<lower>" WHITESPACE ;
202 leaf: PRINTABLE <- "<print>" WHITESPACE ;
203 leaf: PUNCT <- "<punct>" WHITESPACE ;
204 leaf: SPACE <- "<space>" WHITESPACE ;
205 leaf: UPPER <- "<upper>" WHITESPACE ;
206 leaf: WORDCHAR <- "<wordchar>" WHITESPACE ;
207 leaf: XDIGIT <- "<xdigit>" WHITESPACE ;
208
209 void: WHITESPACE <- (" " / "\t" / EOL / COMMENT)* ;
210 void: COMMENT <- '#' (!EOL .)* EOL ;
211 void: EOL <- "\n\r" / "\n" / "\r" ;
212 void: EOF <- !. ;
213
214 # --------------------------------------------------------------------
215 END;
216
217
218 EXAMPLE
219 Our example specifies the grammar for a basic 4-operation calculator.
220
221 PEG calculator (Expression)
222 Digit <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9' ;
223 Sign <- '-' / '+' ;
224 Number <- Sign? Digit+ ;
225 Expression <- Term (AddOp Term)* ;
226 MulOp <- '*' / '/' ;
227 Term <- Factor (MulOp Factor)* ;
228 AddOp <- '+'/'-' ;
229 Factor <- '(' Expression ')' / Number ;
230 END;
231
232
233 Using higher-level features of the notation, i.e. the character classes
234 (predefined and custom), this example can be rewritten as
235
236 PEG calculator (Expression)
237 Sign <- [-+] ;
238 Number <- Sign? <ddigit>+;
239 Expression <- '(' Expression ')' / (Factor (MulOp Factor)*);
240 MulOp <- [*/];
241 Factor <- Term (AddOp Term)*;
242 AddOp <- [-+];
243 Term <- Number;
244 END;
245
246
248 Here we specify the format used by the Parser Tools to serialize Pars‐
249 ing Expression Grammars as immutable values for transport, comparison,
250 etc.
251
252 We distinguish between regular and canonical serializations. While a
253 PEG may have more than one regular serialization only exactly one of
254 them will be canonical.
255
256 regular serialization
257
258 [1] The serialization of any PEG is a nested Tcl dictionary.
259
260 [2] This dictionary holds a single key, pt::grammar::peg, and
261 its value. This value holds the contents of the grammar.
262
263 [3] The contents of the grammar are a Tcl dictionary holding
264 the set of nonterminal symbols and the starting expres‐
265 sion. The relevant keys and their values are
266
267 rules The value is a Tcl dictionary whose keys are the
268 names of the nonterminal symbols known to the
269 grammar.
270
271 [1] Each nonterminal symbol may occur only
272 once.
273
274 [2] The empty string is not a legal nonterminal
275 symbol.
276
277 [3] The value for each symbol is a Tcl dictio‐
278 nary itself. The relevant keys and their
279 values in this dictionary are
280
281 is The value is the serialization of
282 the parsing expression describing
283 the symbols sentennial structure, as
284 specified in the section PE serial‐
285 ization format.
286
287 mode The value can be one of three values
288 specifying how a parser should han‐
289 dle the semantic value produced by
290 the symbol.
291
292 value The semantic value of the
293 nonterminal symbol is an ab‐
294 stract syntax tree consisting
295 of a single node node for the
296 nonterminal itself, which has
297 the ASTs of the symbol's
298 right hand side as its chil‐
299 dren.
300
301 leaf The semantic value of the
302 nonterminal symbol is an ab‐
303 stract syntax tree consisting
304 of a single node node for the
305 nonterminal, without any
306 children. Any ASTs generated
307 by the symbol's right hand
308 side are discarded.
309
310 void The nonterminal has no seman‐
311 tic value. Any ASTs generated
312 by the symbol's right hand
313 side are discarded (as well).
314
315 start The value is the serialization of the start pars‐
316 ing expression of the grammar, as specified in the
317 section PE serialization format.
318
319 [4] The terminal symbols of the grammar are specified implic‐
320 itly as the set of all terminal symbols used in the start
321 expression and on the RHS of the grammar rules.
322
323 canonical serialization
324 The canonical serialization of a grammar has the format as spec‐
325 ified in the previous item, and then additionally satisfies the
326 constraints below, which make it unique among all the possible
327 serializations of this grammar.
328
329 [1] The keys found in all the nested Tcl dictionaries are
330 sorted in ascending dictionary order, as generated by
331 Tcl's builtin command lsort -increasing -dict.
332
333 [2] The string representation of the value is the canonical
334 representation of a Tcl dictionary. I.e. it does not con‐
335 tain superfluous whitespace.
336
337 EXAMPLE
338 Assuming the following PEG for simple mathematical expressions
339
340 PEG calculator (Expression)
341 Digit <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9' ;
342 Sign <- '-' / '+' ;
343 Number <- Sign? Digit+ ;
344 Expression <- Term (AddOp Term)* ;
345 MulOp <- '*' / '/' ;
346 Term <- Factor (MulOp Factor)* ;
347 AddOp <- '+'/'-' ;
348 Factor <- '(' Expression ')' / Number ;
349 END;
350
351
352 then its canonical serialization (except for whitespace) is
353
354 pt::grammar::peg {
355 rules {
356 AddOp {is {/ {t -} {t +}} mode value}
357 Digit {is {/ {t 0} {t 1} {t 2} {t 3} {t 4} {t 5} {t 6} {t 7} {t 8} {t 9}} mode value}
358 Expression {is {x {n Term} {* {x {n AddOp} {n Term}}}} mode value}
359 Factor {is {/ {x {t (} {n Expression} {t )}} {n Number}} mode value}
360 MulOp {is {/ {t *} {t /}} mode value}
361 Number {is {x {? {n Sign}} {+ {n Digit}}} mode value}
362 Sign {is {/ {t -} {t +}} mode value}
363 Term {is {x {n Factor} {* {x {n MulOp} {n Factor}}}} mode value}
364 }
365 start {n Expression}
366 }
367
368
370 Here we specify the format used by the Parser Tools to serialize Pars‐
371 ing Expressions as immutable values for transport, comparison, etc.
372
373 We distinguish between regular and canonical serializations. While a
374 parsing expression may have more than one regular serialization only
375 exactly one of them will be canonical.
376
377 Regular serialization
378
379 Atomic Parsing Expressions
380
381 [1] The string epsilon is an atomic parsing expres‐
382 sion. It matches the empty string.
383
384 [2] The string dot is an atomic parsing expression. It
385 matches any character.
386
387 [3] The string alnum is an atomic parsing expression.
388 It matches any Unicode alphabet or digit charac‐
389 ter. This is a custom extension of PEs based on
390 Tcl's builtin command string is.
391
392 [4] The string alpha is an atomic parsing expression.
393 It matches any Unicode alphabet character. This is
394 a custom extension of PEs based on Tcl's builtin
395 command string is.
396
397 [5] The string ascii is an atomic parsing expression.
398 It matches any Unicode character below U0080. This
399 is a custom extension of PEs based on Tcl's
400 builtin command string is.
401
402 [6] The string control is an atomic parsing expres‐
403 sion. It matches any Unicode control character.
404 This is a custom extension of PEs based on Tcl's
405 builtin command string is.
406
407 [7] The string digit is an atomic parsing expression.
408 It matches any Unicode digit character. Note that
409 this includes characters outside of the [0..9]
410 range. This is a custom extension of PEs based on
411 Tcl's builtin command string is.
412
413 [8] The string graph is an atomic parsing expression.
414 It matches any Unicode printing character, except
415 for space. This is a custom extension of PEs based
416 on Tcl's builtin command string is.
417
418 [9] The string lower is an atomic parsing expression.
419 It matches any Unicode lower-case alphabet charac‐
420 ter. This is a custom extension of PEs based on
421 Tcl's builtin command string is.
422
423 [10] The string print is an atomic parsing expression.
424 It matches any Unicode printing character, includ‐
425 ing space. This is a custom extension of PEs based
426 on Tcl's builtin command string is.
427
428 [11] The string punct is an atomic parsing expression.
429 It matches any Unicode punctuation character. This
430 is a custom extension of PEs based on Tcl's
431 builtin command string is.
432
433 [12] The string space is an atomic parsing expression.
434 It matches any Unicode space character. This is a
435 custom extension of PEs based on Tcl's builtin
436 command string is.
437
438 [13] The string upper is an atomic parsing expression.
439 It matches any Unicode upper-case alphabet charac‐
440 ter. This is a custom extension of PEs based on
441 Tcl's builtin command string is.
442
443 [14] The string wordchar is an atomic parsing expres‐
444 sion. It matches any Unicode word character. This
445 is any alphanumeric character (see alnum), and any
446 connector punctuation characters (e.g. under‐
447 score). This is a custom extension of PEs based on
448 Tcl's builtin command string is.
449
450 [15] The string xdigit is an atomic parsing expression.
451 It matches any hexadecimal digit character. This
452 is a custom extension of PEs based on Tcl's
453 builtin command string is.
454
455 [16] The string ddigit is an atomic parsing expression.
456 It matches any decimal digit character. This is a
457 custom extension of PEs based on Tcl's builtin
458 command regexp.
459
460 [17] The expression [list t x] is an atomic parsing ex‐
461 pression. It matches the terminal string x.
462
463 [18] The expression [list n A] is an atomic parsing ex‐
464 pression. It matches the nonterminal A.
465
466 Combined Parsing Expressions
467
468 [1] For parsing expressions e1, e2, ... the result of
469 [list / e1 e2 ... ] is a parsing expression as
470 well. This is the ordered choice, aka prioritized
471 choice.
472
473 [2] For parsing expressions e1, e2, ... the result of
474 [list x e1 e2 ... ] is a parsing expression as
475 well. This is the sequence.
476
477 [3] For a parsing expression e the result of [list *
478 e] is a parsing expression as well. This is the
479 kleene closure, describing zero or more repeti‐
480 tions.
481
482 [4] For a parsing expression e the result of [list +
483 e] is a parsing expression as well. This is the
484 positive kleene closure, describing one or more
485 repetitions.
486
487 [5] For a parsing expression e the result of [list &
488 e] is a parsing expression as well. This is the
489 and lookahead predicate.
490
491 [6] For a parsing expression e the result of [list !
492 e] is a parsing expression as well. This is the
493 not lookahead predicate.
494
495 [7] For a parsing expression e the result of [list ?
496 e] is a parsing expression as well. This is the
497 optional input.
498
499 Canonical serialization
500 The canonical serialization of a parsing expression has the for‐
501 mat as specified in the previous item, and then additionally
502 satisfies the constraints below, which make it unique among all
503 the possible serializations of this parsing expression.
504
505 [1] The string representation of the value is the canonical
506 representation of a pure Tcl list. I.e. it does not con‐
507 tain superfluous whitespace.
508
509 [2] Terminals are not encoded as ranges (where start and end
510 of the range are identical).
511
512 EXAMPLE
513 Assuming the parsing expression shown on the right-hand side of the
514 rule
515
516 Expression <- Term (AddOp Term)*
517
518
519 then its canonical serialization (except for whitespace) is
520
521 {x {n Term} {* {x {n AddOp} {n Term}}}}
522
523
525 This document, and the package it describes, will undoubtedly contain
526 bugs and other problems. Please report such in the category pt of the
527 Tcllib Trackers [http://core.tcl.tk/tcllib/reportlist]. Please also
528 report any ideas for enhancements you may have for either package
529 and/or documentation.
530
531 When proposing code changes, please provide unified diffs, i.e the out‐
532 put of diff -u.
533
534 Note further that attachments are strongly preferred over inlined
535 patches. Attachments can be made by going to the Edit form of the
536 ticket immediately after its creation, and then using the left-most
537 button in the secondary navigation bar.
538
540 EBNF, LL(k), PEG, TDPL, context-free languages, conversion, expression,
541 format conversion, grammar, matching, parser, parsing expression, pars‐
542 ing expression grammar, push down automaton, recursive descent, serial‐
543 ization, state, top-down parsing languages, transducer
544
546 Parsing and Grammars
547
549 Copyright (c) 2009 Andreas Kupries <andreas_kupries@users.sourceforge.net>
550
551
552
553
554tcllib 1.0.2 pt::peg::to::peg(n)