1pt::peg::to::cparam(n) Parser Tools pt::peg::to::cparam(n)
2
3
4
5______________________________________________________________________________
6
8 pt::peg::to::cparam - PEG Conversion. Write CPARAM format
9
11 package require Tcl 8.5
12
13 package require pt::peg::to::cparam ?1.1.2?
14
15 pt::peg::to::cparam reset
16
17 pt::peg::to::cparam configure
18
19 pt::peg::to::cparam configure option
20
21 pt::peg::to::cparam configure option value...
22
23 pt::peg::to::cparam convert serial
24
25______________________________________________________________________________
26
28 Are you lost ? Do you have trouble understanding this document ? In
29 that case please read the overview provided by the Introduction to
30 Parser Tools. This document is the entrypoint to the whole system the
31 current package is a part of.
32
33 This package implements the converter from parsing expression grammars
34 to CPARAM markup.
35
36 It resides in the Export section of the Core Layer of Parser Tools, and
37 can be used either directly with the other packages of this layer, or
38 indirectly through the export manager provided by pt::peg::export. The
39 latter is intented for use in untrusted environments and done through
40 the corresponding export plugin pt::peg::export::cparam sitting between
41 converter and export manager.
42
43 IMAGE: arch_core_eplugins
44
46 The API provided by this package satisfies the specification of the
47 Converter API found in the Parser Tools Export API specification.
48
49 pt::peg::to::cparam reset
50 This command resets the configuration of the package to its
51 default settings.
52
53 pt::peg::to::cparam configure
54 This command returns a dictionary containing the current config‐
55 uration of the package.
56
57 pt::peg::to::cparam configure option
58 This command returns the current value of the specified configu‐
59 ration option of the package. For the set of legal options,
60 please read the section Options.
61
62 pt::peg::to::cparam configure option value...
63 This command sets the given configuration options of the pack‐
64 age, to the specified values. For the set of legal options,
65 please read the section Options.
66
67 pt::peg::to::cparam convert serial
68 This command takes the canonical serialization of a parsing
69 expression grammar, as specified in section PEG serialization
70 format, and contained in serial, and generates CPARAM markup
71 encoding the grammar, per the current package configuration.
72 The created string is then returned as the result of the com‐
73 mand.
74
76 The converter to C code recognizes the following configuration vari‐
77 ables and changes its behaviour as they specify.
78
79 -file string
80 The value of this option is the name of the file or other entity
81 from which the grammar came, for which the command is run. The
82 default value is unknown.
83
84 -name string
85 The value of this option is the name of the grammar we are pro‐
86 cessing. The default value is a_pe_grammar.
87
88 -user string
89 The value of this option is the name of the user for which the
90 command is run. The default value is unknown.
91
92 -template string
93 The value of this option is a string into which to put the gen‐
94 erated text and the other configuration settings. The various
95 locations for user-data are expected to be specified with the
96 placeholders listed below. The default value is "@code@".
97
98 @user@ To be replaced with the value of the option -user.
99
100 @format@
101 To be replaced with the the constant C/PARAM.
102
103 @file@ To be replaced with the value of the option -file.
104
105 @name@ To be replaced with the value of the option -name.
106
107 @code@ To be replaced with the generated Tcl code.
108
109 The following options are special, in that they will occur
110 within the generated code, and are replaced there as well.
111
112 @statedecl@
113 To be replaced with the value of the option state-decl.
114
115 @stateref@
116 To be replaced with the value of the option state-ref.
117
118 @strings@
119 To be replaced with the value of the option string-var‐
120 name.
121
122 @self@ To be replaced with the value of the option self-command.
123
124 @def@ To be replaced with the value of the option fun-quali‐
125 fier.
126
127 @ns@ To be replaced with the value of the option namespace.
128
129 @main@ To be replaced with the value of the option main.
130
131 @prelude@
132 To be replaced with the value of the option prelude.
133
134 -state-decl string
135 A C string representing the argument declaration to use in the
136 generated parsing functions to refer to the parsing state. In
137 essence type and argument name. The default value is the string
138 RDE_PARAM p.
139
140 -state-ref string
141 A C string representing the argument named used in the generated
142 parsing functions to refer to the parsing state. The default
143 value is the string p.
144
145 -self-command string
146 A C string representing the reference needed to call the gener‐
147 ated parser function (methods ...) from another parser fonction,
148 per the chosen framework (template). The default value is the
149 empty string.
150
151 -fun-qualifier string
152 A C string containing the attributes to give to the generated
153 functions (methods ...), per the chosen framework (template).
154 The default value is static.
155
156 -namespace string
157 The name of the C namespace the parser functions (methods, ...)
158 shall reside in, or a general prefix to add to the function
159 names. The default value is the empty string.
160
161 -main string
162 The name of the main function (method, ...) to be called by the
163 chosen framework (template) to start parsing input. The default
164 value is __main.
165
166 -string-varname string
167 The name of the variable used for the table of strings used by
168 the generated parser, i.e. error messages, symbol names, etc.
169 The default value is p_string.
170
171 -prelude string
172 A snippet of code to be inserted at the head of each generated
173 parsing function. The default value is the empty string.
174
175 -indent integer
176 The number of characters to indent each line of the generated
177 code by. The default value is 0.
178
179 -comments boolean
180 A flag controlling the generation of code comments containing
181 the original parsing expression a parsing function is for. The
182 default value is on.
183
184 While the high parameterizability of this converter, as shown by the
185 multitude of options it supports, is an advantage to the advanced user,
186 allowing her to customize the output of the converter as needed, a
187 novice user will likely not see the forest for the trees.
188
189 To help these latter users an adjunct package is provided, containing a
190 canned configuration which will generate immediately useful full
191 parsers. It is
192
193 pt::cparam::configuration::critcl
194 Generated parsers are embedded into a Critcl-based framework.
195
197 The c format is executable code, a parser for the grammar. The parser
198 implementation is written in C and can be tweaked to the users' needs
199 through a multitude of options.
200
201 The critcl format, for example, is implemented as a canned configura‐
202 tion of these options on top of the generator for c.
203
204 The bulk of such a framework has to be specified through the option
205 -template. The additional options
206
207 -fun-qualifier string
208
209 -main string
210
211 -namespace string
212
213 -prelude string
214
215 -self-command string
216
217 -state-decl string
218
219 -state-ref string
220
221 -string-varname string
222
223 provide code snippets which help to glue framework and generated code
224 together. Their placeholders are in the generated code. Further the
225 options
226
227 -indent integer
228
229 -comments boolean
230
231 allow for the customization of the code indent (default none), and
232 whether to generate comments showing the parsing expressions a function
233 is for (default on).
234
235 EXAMPLE
236 We are forgoing an example of this representation, with apologies. It
237 would be way to large for this document.
238
240 Here we specify the format used by the Parser Tools to serialize Pars‐
241 ing Expression Grammars as immutable values for transport, comparison,
242 etc.
243
244 We distinguish between regular and canonical serializations. While a
245 PEG may have more than one regular serialization only exactly one of
246 them will be canonical.
247
248 regular serialization
249
250 [1] The serialization of any PEG is a nested Tcl dictionary.
251
252 [2] This dictionary holds a single key, pt::grammar::peg, and
253 its value. This value holds the contents of the grammar.
254
255 [3] The contents of the grammar are a Tcl dictionary holding
256 the set of nonterminal symbols and the starting expres‐
257 sion. The relevant keys and their values are
258
259 rules The value is a Tcl dictionary whose keys are the
260 names of the nonterminal symbols known to the
261 grammar.
262
263 [1] Each nonterminal symbol may occur only
264 once.
265
266 [2] The empty string is not a legal nonterminal
267 symbol.
268
269 [3] The value for each symbol is a Tcl dictio‐
270 nary itself. The relevant keys and their
271 values in this dictionary are
272
273 is The value is the serialization of
274 the parsing expression describing
275 the symbols sentennial structure, as
276 specified in the section PE serial‐
277 ization format.
278
279 mode The value can be one of three values
280 specifying how a parser should han‐
281 dle the semantic value produced by
282 the symbol.
283
284 value The semantic value of the
285 nonterminal symbol is an
286 abstract syntax tree consist‐
287 ing of a single node node for
288 the nonterminal itself, which
289 has the ASTs of the symbol's
290 right hand side as its chil‐
291 dren.
292
293 leaf The semantic value of the
294 nonterminal symbol is an
295 abstract syntax tree consist‐
296 ing of a single node node for
297 the nonterminal, without any
298 children. Any ASTs generated
299 by the symbol's right hand
300 side are discarded.
301
302 void The nonterminal has no seman‐
303 tic value. Any ASTs generated
304 by the symbol's right hand
305 side are discarded (as well).
306
307 start The value is the serialization of the start pars‐
308 ing expression of the grammar, as specified in the
309 section PE serialization format.
310
311 [4] The terminal symbols of the grammar are specified implic‐
312 itly as the set of all terminal symbols used in the start
313 expression and on the RHS of the grammar rules.
314
315 canonical serialization
316 The canonical serialization of a grammar has the format as spec‐
317 ified in the previous item, and then additionally satisfies the
318 constraints below, which make it unique among all the possible
319 serializations of this grammar.
320
321 [1] The keys found in all the nested Tcl dictionaries are
322 sorted in ascending dictionary order, as generated by
323 Tcl's builtin command lsort -increasing -dict.
324
325 [2] The string representation of the value is the canonical
326 representation of a Tcl dictionary. I.e. it does not con‐
327 tain superfluous whitespace.
328
329 EXAMPLE
330 Assuming the following PEG for simple mathematical expressions
331
332 PEG calculator (Expression)
333 Digit <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9' ;
334 Sign <- '-' / '+' ;
335 Number <- Sign? Digit+ ;
336 Expression <- Term (AddOp Term)* ;
337 MulOp <- '*' / '/' ;
338 Term <- Factor (MulOp Factor)* ;
339 AddOp <- '+'/'-' ;
340 Factor <- '(' Expression ')' / Number ;
341 END;
342
343
344 then its canonical serialization (except for whitespace) is
345
346 pt::grammar::peg {
347 rules {
348 AddOp {is {/ {t -} {t +}} mode value}
349 Digit {is {/ {t 0} {t 1} {t 2} {t 3} {t 4} {t 5} {t 6} {t 7} {t 8} {t 9}} mode value}
350 Expression {is {x {n Term} {* {x {n AddOp} {n Term}}}} mode value}
351 Factor {is {/ {x {t (} {n Expression} {t )}} {n Number}} mode value}
352 MulOp {is {/ {t *} {t /}} mode value}
353 Number {is {x {? {n Sign}} {+ {n Digit}}} mode value}
354 Sign {is {/ {t -} {t +}} mode value}
355 Term {is {x {n Factor} {* {x {n MulOp} {n Factor}}}} mode value}
356 }
357 start {n Expression}
358 }
359
360
362 Here we specify the format used by the Parser Tools to serialize Pars‐
363 ing Expressions as immutable values for transport, comparison, etc.
364
365 We distinguish between regular and canonical serializations. While a
366 parsing expression may have more than one regular serialization only
367 exactly one of them will be canonical.
368
369 Regular serialization
370
371 Atomic Parsing Expressions
372
373 [1] The string epsilon is an atomic parsing expres‐
374 sion. It matches the empty string.
375
376 [2] The string dot is an atomic parsing expression. It
377 matches any character.
378
379 [3] The string alnum is an atomic parsing expression.
380 It matches any Unicode alphabet or digit charac‐
381 ter. This is a custom extension of PEs based on
382 Tcl's builtin command string is.
383
384 [4] The string alpha is an atomic parsing expression.
385 It matches any Unicode alphabet character. This is
386 a custom extension of PEs based on Tcl's builtin
387 command string is.
388
389 [5] The string ascii is an atomic parsing expression.
390 It matches any Unicode character below U0080. This
391 is a custom extension of PEs based on Tcl's
392 builtin command string is.
393
394 [6] The string control is an atomic parsing expres‐
395 sion. It matches any Unicode control character.
396 This is a custom extension of PEs based on Tcl's
397 builtin command string is.
398
399 [7] The string digit is an atomic parsing expression.
400 It matches any Unicode digit character. Note that
401 this includes characters outside of the [0..9]
402 range. This is a custom extension of PEs based on
403 Tcl's builtin command string is.
404
405 [8] The string graph is an atomic parsing expression.
406 It matches any Unicode printing character, except
407 for space. This is a custom extension of PEs based
408 on Tcl's builtin command string is.
409
410 [9] The string lower is an atomic parsing expression.
411 It matches any Unicode lower-case alphabet charac‐
412 ter. This is a custom extension of PEs based on
413 Tcl's builtin command string is.
414
415 [10] The string print is an atomic parsing expression.
416 It matches any Unicode printing character, includ‐
417 ing space. This is a custom extension of PEs based
418 on Tcl's builtin command string is.
419
420 [11] The string punct is an atomic parsing expression.
421 It matches any Unicode punctuation character. This
422 is a custom extension of PEs based on Tcl's
423 builtin command string is.
424
425 [12] The string space is an atomic parsing expression.
426 It matches any Unicode space character. This is a
427 custom extension of PEs based on Tcl's builtin
428 command string is.
429
430 [13] The string upper is an atomic parsing expression.
431 It matches any Unicode upper-case alphabet charac‐
432 ter. This is a custom extension of PEs based on
433 Tcl's builtin command string is.
434
435 [14] The string wordchar is an atomic parsing expres‐
436 sion. It matches any Unicode word character. This
437 is any alphanumeric character (see alnum), and any
438 connector punctuation characters (e.g. under‐
439 score). This is a custom extension of PEs based on
440 Tcl's builtin command string is.
441
442 [15] The string xdigit is an atomic parsing expression.
443 It matches any hexadecimal digit character. This
444 is a custom extension of PEs based on Tcl's
445 builtin command string is.
446
447 [16] The string ddigit is an atomic parsing expression.
448 It matches any decimal digit character. This is a
449 custom extension of PEs based on Tcl's builtin
450 command regexp.
451
452 [17] The expression [list t x] is an atomic parsing
453 expression. It matches the terminal string x.
454
455 [18] The expression [list n A] is an atomic parsing
456 expression. It matches the nonterminal A.
457
458 Combined Parsing Expressions
459
460 [1] For parsing expressions e1, e2, ... the result of
461 [list / e1 e2 ... ] is a parsing expression as
462 well. This is the ordered choice, aka prioritized
463 choice.
464
465 [2] For parsing expressions e1, e2, ... the result of
466 [list x e1 e2 ... ] is a parsing expression as
467 well. This is the sequence.
468
469 [3] For a parsing expression e the result of [list *
470 e] is a parsing expression as well. This is the
471 kleene closure, describing zero or more repeti‐
472 tions.
473
474 [4] For a parsing expression e the result of [list +
475 e] is a parsing expression as well. This is the
476 positive kleene closure, describing one or more
477 repetitions.
478
479 [5] For a parsing expression e the result of [list &
480 e] is a parsing expression as well. This is the
481 and lookahead predicate.
482
483 [6] For a parsing expression e the result of [list !
484 e] is a parsing expression as well. This is the
485 not lookahead predicate.
486
487 [7] For a parsing expression e the result of [list ?
488 e] is a parsing expression as well. This is the
489 optional input.
490
491 Canonical serialization
492 The canonical serialization of a parsing expression has the for‐
493 mat as specified in the previous item, and then additionally
494 satisfies the constraints below, which make it unique among all
495 the possible serializations of this parsing expression.
496
497 [1] The string representation of the value is the canonical
498 representation of a pure Tcl list. I.e. it does not con‐
499 tain superfluous whitespace.
500
501 [2] Terminals are not encoded as ranges (where start and end
502 of the range are identical).
503
504 EXAMPLE
505 Assuming the parsing expression shown on the right-hand side of the
506 rule
507
508 Expression <- Term (AddOp Term)*
509
510
511 then its canonical serialization (except for whitespace) is
512
513 {x {n Term} {* {x {n AddOp} {n Term}}}}
514
515
517 This document, and the package it describes, will undoubtedly contain
518 bugs and other problems. Please report such in the category pt of the
519 Tcllib Trackers [http://core.tcl.tk/tcllib/reportlist]. Please also
520 report any ideas for enhancements you may have for either package
521 and/or documentation.
522
523 When proposing code changes, please provide unified diffs, i.e the out‐
524 put of diff -u.
525
526 Note further that attachments are strongly preferred over inlined
527 patches. Attachments can be made by going to the Edit form of the
528 ticket immediately after its creation, and then using the left-most
529 button in the secondary navigation bar.
530
532 CPARAM, EBNF, LL(k), PEG, TDPL, context-free languages, conversion,
533 expression, format conversion, grammar, matching, parser, parsing
534 expression, parsing expression grammar, push down automaton, recursive
535 descent, serialization, state, top-down parsing languages, transducer
536
538 Parsing and Grammars
539
541 Copyright (c) 2009 Andreas Kupries <andreas_kupries@users.sourceforge.net>
542
543
544
545
546tcllib 1.1.2 pt::peg::to::cparam(n)