1pt(n) Parser Tools pt(n)
2
3
4
5______________________________________________________________________________
6
8 pt - Parser Tools Application
9
11 package require Tcl 8.5
12
13 pt generate resultformat ?options...? resultfile inputformat inputfile
14
15______________________________________________________________________________
16
18 Are you lost ? Do you have trouble understanding this document ? In
19 that case please read the overview provided by the Introduction to
20 Parser Tools. This document is the entrypoint to the whole system the
21 current package is a part of.
22
23 This document describes pt, the main application of the module, a
24 parser generator. Its intended audience are people who wish to create a
25 parser for some language of theirs. Should you wish to modify the ap‐
26 plication instead, please see the section about the application's In‐
27 ternals for the basic references.
28
29 It resides in the User Application Layer of Parser Tools.
30
31 IMAGE: arch_user_app
32
34 pt generate resultformat ?options...? resultfile inputformat inputfile
35 This sub-command of the application reads the parsing expression
36 grammar stored in the inputfile in the format inputformat, con‐
37 verts it to the resultformat under the direction of the (format-
38 specific) set of options specified by the user and stores the
39 result in the resultfile.
40
41 The inputfile has to exist, while the resultfile may be created,
42 overwriting any pre-existing content of the file. Any missing
43 directory in the path to the resultfile will be created as well.
44
45 The exact form of the result for, and the set of options sup‐
46 ported by the known result-formats, are explained in the upcom‐
47 ing sections of this document, with the list below providing an
48 index mapping between format name and its associated section. In
49 alphabetical order:
50
51
52 c A resultformat. See section C Parser.
53
54 container
55 A resultformat. See section Grammar Container.
56
57 critcl A resultformat. See section C Parser Embedded In Tcl.
58
59 json A input- and resultformat. See section JSON Grammar Ex‐
60 change.
61
62 oo A resultformat. See section TclOO Parser.
63
64 peg A input- and resultformat. See section PEG Specification
65 Language.
66
67 snit A resultformat. See section Snit Parser.
68
69 Of the seven possible results four are parsers outright (c, critcl, oo,
70 and snit), one (container) provides code which can be used in conjunc‐
71 tion with a generic parser (also known as a grammar interpreter), and
72 the last two (json and peg) are doing double-duty as input formats, al‐
73 lowing the transformation of grammars for exchange, reformatting, and
74 the like.
75
76 The created parsers fall into three categories:
77
78 .nf + --- C ---> critcl, c | + --- specialized -+ | |
79 ---+ + --- Tcl -> snit, oo | + --- interpreted (Tcl)
80 ------> container .fi
81
82 Specialized parsers implemented in C
83 The fastest parsers are created when using the result formats c
84 and critcl. The first returns the raw C code for the parser,
85 while the latter wraps it into a Tcl package using CriTcl.
86
87 This makes the latter much easier to use than the former. On the
88 other hand, the former can be adapted to the users' requirements
89 through a multitude of options, allowing for things like usage
90 of the parser outside of a Tcl environment, something the critcl
91 format doesn't support. As such the c format is meant for more
92 advanced users, or users with special needs.
93
94 A disadvantage of all the parsers in this section is the need to
95 run them through a C compiler to make them actually executable.
96 This is not something everyone has the necessary tools for. The
97 parsers in the next section are for people under such restric‐
98 tions.
99
100 Specialized parsers implemented in Tcl
101 As the parsers in this section are implemented in Tcl they are
102 quite a bit slower than anything from the previous section. On
103 the other hand this allows them to be used in pure-Tcl environ‐
104 ments, or in environments which allow only a limited set of bi‐
105 nary packages. In the latter case it will be advantageous to
106 lobby for the inclusion of the C-based runtime support (notes
107 below) into the environment to reduce the impact of Tcl's on the
108 speed of these parsers.
109
110 The relevant formats are snit and oo. Both place their result
111 into a Tcl package containing a snit::type, or TclOO class re‐
112 spectively.
113
114 Of the supporting runtime, which is the package pt::rde, the
115 user has to know nothing but that it does exist and that the
116 parsers are dependent on it. Knowledge of the API exported by
117 the runtime for the parsers' consumption is not required by the
118 parsers' users.
119
120 Interpreted parsing implemented in Tcl
121 The last category, grammar interpretation. This means that an
122 interpreter for parsing expression grammars takes the descrip‐
123 tion of the grammar to parse input for, and uses it guide the
124 parsing process. This is the slowest of the available options,
125 as the interpreter has to continually run through the configured
126 grammar, whereas the specialized parsers of the previous sec‐
127 tions have the relevant knowledge about the grammar baked into
128 them.
129
130 The only places where using interpretation make sense is where
131 the grammar for some input may be changed interactively by the
132 user, as the interpretation allows for quick turnaround after
133 each change, whereas the previous methods require the generation
134 of a whole new parser, which is not as fast. On the other hand,
135 wherever the grammar to use is fixed, the previous methods are
136 much more advantageous as the time to generate the parser is mi‐
137 nuscule compared to the time the parser code is in use.
138
139 The relevant result format is container. It (quickly) generates
140 grammar descriptions (instead of a full parser) which match the
141 API expected by ParserTools' grammar interpreter. The latter is
142 provided by the package pt::peg::interp.
143
144 All the parsers generated by critcl, snit, and oo, and the grammar in‐
145 terpreter share a common API for access to the actual parsing function‐
146 ality, making them all plug-compatible. It is described in the Parser
147 API specification document.
148
150 peg, a language for the specification of parsing expression grammars is
151 meant to be human readable, and writable as well, yet strict enough to
152 allow its processing by machine. Like any computer language. It was de‐
153 fined to make writing the specification of a grammar easy, something
154 the other formats found in the Parser Tools do not lend themselves too.
155
156 For either an introduction to or the formal specification of the lan‐
157 guage, please go and read the PEG Language Tutorial.
158
159 When used as a result-format this format supports the following op‐
160 tions:
161
162 -file string
163 The value of this option is the name of the file or other entity
164 from which the grammar came, for which the command is run. The
165 default value is unknown.
166
167 -name string
168 The value of this option is the name of the grammar we are pro‐
169 cessing. The default value is a_pe_grammar.
170
171 -user string
172 The value of this option is the name of the user for which the
173 command is run. The default value is unknown.
174
175 -template string
176 The value of this option is a string into which to put the gen‐
177 erated text and the values of the other options. The various lo‐
178 cations for user-data are expected to be specified with the
179 placeholders listed below. The default value is "@code@".
180
181 @user@ To be replaced with the value of the option -user.
182
183 @format@
184 To be replaced with the the constant PEG.
185
186 @file@ To be replaced with the value of the option -file.
187
188 @name@ To be replaced with the value of the option -name.
189
190 @code@ To be replaced with the generated text.
191
193 The json format for parsing expression grammars was written as a data
194 exchange format not bound to Tcl. It was defined to allow the exchange
195 of grammars with PackRat/PEG based parser generators for other lan‐
196 guages.
197
198 For the formal specification of the JSON grammar exchange format,
199 please go and read The JSON Grammar Exchange Format.
200
201 When used as a result-format this format supports the following op‐
202 tions:
203
204 -file string
205 The value of this option is the name of the file or other entity
206 from which the grammar came, for which the command is run. The
207 default value is unknown.
208
209 -name string
210 The value of this option is the name of the grammar we are pro‐
211 cessing. The default value is a_pe_grammar.
212
213 -user string
214 The value of this option is the name of the user for which the
215 command is run. The default value is unknown.
216
217 -indented boolean
218 If this option is set the system will break the generated JSON
219 across lines and indent it according to its inner structure,
220 with each key of a dictionary on a separate line.
221
222 If the option is not set (the default), the whole JSON object
223 will be written on a single line, with minimum spacing between
224 all elements.
225
226 -aligned boolean
227 If this option is set the system will ensure that the values for
228 the keys in a dictionary are vertically aligned with each other,
229 for a nice table effect. To make this work this also implies
230 that -indented is set.
231
232 If the option is not set (the default), the output is formatted
233 as per the value of indented, without trying to align the values
234 for dictionary keys.
235
237 The critcl format is executable code, a parser for the grammar. It is a
238 Tcl package with the actual parser implementation written in C and em‐
239 bedded in Tcl via the critcl package.
240
241 This result-format supports the following options:
242
243 -file string
244 The value of this option is the name of the file or other entity
245 from which the grammar came, for which the command is run. The
246 default value is unknown.
247
248 -name string
249 The value of this option is the name of the grammar we are pro‐
250 cessing. The default value is a_pe_grammar.
251
252 -user string
253 The value of this option is the name of the user for which the
254 command is run. The default value is unknown.
255
256 -class string
257 The value of this option is the name of the class to generate,
258 without leading colons. The default value is CLASS.
259
260 For a simple value X without colons, like CLASS, the parser com‐
261 mand will be X::X. Whereas for a namespaced value X::Y the
262 parser command will be X::Y.
263
264 -package string
265 The value of this option is the name of the package to generate.
266 The default value is PACKAGE.
267
268 -version string
269 The value of this option is the version of the package to gener‐
270 ate. The default value is 1.
271
273 The c format is executable code, a parser for the grammar. The parser
274 implementation is written in C and can be tweaked to the users' needs
275 through a multitude of options.
276
277 The critcl format, for example, is implemented as a canned configura‐
278 tion of these options on top of the generator for c.
279
280 This result-format supports the following options:
281
282 -file string
283 The value of this option is the name of the file or other entity
284 from which the grammar came, for which the command is run. The
285 default value is unknown.
286
287 -name string
288 The value of this option is the name of the grammar we are pro‐
289 cessing. The default value is a_pe_grammar.
290
291 -user string
292 The value of this option is the name of the user for which the
293 command is run. The default value is unknown.
294
295 -template string
296 The value of this option is a string into which to put the gen‐
297 erated text and the other configuration settings. The various
298 locations for user-data are expected to be specified with the
299 placeholders listed below. The default value is "@code@".
300
301 @user@ To be replaced with the value of the option -user.
302
303 @format@
304 To be replaced with the the constant C/PARAM.
305
306 @file@ To be replaced with the value of the option -file.
307
308 @name@ To be replaced with the value of the option -name.
309
310 @code@ To be replaced with the generated Tcl code.
311
312 The following options are special, in that they will occur
313 within the generated code, and are replaced there as well.
314
315 @statedecl@
316 To be replaced with the value of the option state-decl.
317
318 @stateref@
319 To be replaced with the value of the option state-ref.
320
321 @strings@
322 To be replaced with the value of the option string-var‐
323 name.
324
325 @self@ To be replaced with the value of the option self-command.
326
327 @def@ To be replaced with the value of the option fun-quali‐
328 fier.
329
330 @ns@ To be replaced with the value of the option namespace.
331
332 @main@ To be replaced with the value of the option main.
333
334 @prelude@
335 To be replaced with the value of the option prelude.
336
337 -state-decl string
338 A C string representing the argument declaration to use in the
339 generated parsing functions to refer to the parsing state. In
340 essence type and argument name. The default value is the string
341 RDE_PARAM p.
342
343 -state-ref string
344 A C string representing the argument named used in the generated
345 parsing functions to refer to the parsing state. The default
346 value is the string p.
347
348 -self-command string
349 A C string representing the reference needed to call the gener‐
350 ated parser function (methods ...) from another parser fonction,
351 per the chosen framework (template). The default value is the
352 empty string.
353
354 -fun-qualifier string
355 A C string containing the attributes to give to the generated
356 functions (methods ...), per the chosen framework (template).
357 The default value is static.
358
359 -namespace string
360 The name of the C namespace the parser functions (methods, ...)
361 shall reside in, or a general prefix to add to the function
362 names. The default value is the empty string.
363
364 -main string
365 The name of the main function (method, ...) to be called by the
366 chosen framework (template) to start parsing input. The default
367 value is __main.
368
369 -string-varname string
370 The name of the variable used for the table of strings used by
371 the generated parser, i.e. error messages, symbol names, etc.
372 The default value is p_string.
373
374 -prelude string
375 A snippet of code to be inserted at the head of each generated
376 parsing function. The default value is the empty string.
377
378 -indent integer
379 The number of characters to indent each line of the generated
380 code by. The default value is 0.
381
382 -comments boolean
383 A flag controlling the generation of code comments containing
384 the original parsing expression a parsing function is for. The
385 default value is on.
386
388 The snit format is executable code, a parser for the grammar. It is a
389 Tcl package holding a snit::type, i.e. a class, whose instances are
390 parsers for the input grammar.
391
392 This result-format supports the following options:
393
394 -file string
395 The value of this option is the name of the file or other entity
396 from which the grammar came, for which the command is run. The
397 default value is unknown.
398
399 -name string
400 The value of this option is the name of the grammar we are pro‐
401 cessing. The default value is a_pe_grammar.
402
403 -user string
404 The value of this option is the name of the user for which the
405 command is run. The default value is unknown.
406
407 -class string
408 The value of this option is the name of the class to generate,
409 without leading colons. Note, it serves double-duty as the name
410 of the package to generate too, if option -package is not speci‐
411 fied, see below. The default value is CLASS, applying if nei‐
412 ther option -class nor -package were specified.
413
414 -package string
415 The value of this option is the name of the package to generate,
416 without leading colons. Note, it serves double-duty as the name
417 of the class to generate too, if option -class is not specified,
418 see above. The default value is PACKAGE, applying if neither
419 option -package nor -class were specified.
420
421 -version string
422 The value of this option is the version of the package to gener‐
423 ate. The default value is 1.
424
426 The oo format is executable code, a parser for the grammar. It is a Tcl
427 package holding a TclOO class, whose instances are parsers for the in‐
428 put grammar.
429
430 This result-format supports the following options:
431
432 -file string
433 The value of this option is the name of the file or other entity
434 from which the grammar came, for which the command is run. The
435 default value is unknown.
436
437 -name string
438 The value of this option is the name of the grammar we are pro‐
439 cessing. The default value is a_pe_grammar.
440
441 -user string
442 The value of this option is the name of the user for which the
443 command is run. The default value is unknown.
444
445 -class string
446 The value of this option is the name of the class to generate,
447 without leading colons. Note, it serves double-duty as the name
448 of the package to generate too, if option -package is not speci‐
449 fied, see below. The default value is CLASS, applying if nei‐
450 ther option -class nor -package were specified.
451
452 -package string
453 The value of this option is the name of the package to generate,
454 without leading colons. Note, it serves double-duty as the name
455 of the class to generate too, if option -class is not specified,
456 see above. The default value is PACKAGE, applying if neither
457 option -package nor -class were specified.
458
459 -version string
460 The value of this option is the version of the package to gener‐
461 ate. The default value is 1.
462
464 The container format is another form of describing parsing expression
465 grammars. While data in this format is executable it does not consti‐
466 tute a parser for the grammar. It always has to be used in conjunction
467 with the package pt::peg::interp, a grammar interpreter.
468
469 The format represents grammars by a snit::type, i.e. class, whose in‐
470 stances are API-compatible to the instances of the pt::peg::container
471 package, and which are preloaded with the grammar in question.
472
473 This result-format supports the following options:
474
475 -file string
476 The value of this option is the name of the file or other entity
477 from which the grammar came, for which the command is run. The
478 default value is unknown.
479
480 -name string
481 The value of this option is the name of the grammar we are pro‐
482 cessing. The default value is a_pe_grammar.
483
484 -user string
485 The value of this option is the name of the user for which the
486 command is run. The default value is unknown.
487
488 -mode bulk|incremental
489 The value of this option controls which methods of pt::peg::con‐
490 tainer instances are used to specify the grammar, i.e. preload
491 it into the container. There are two legal values, as listed be‐
492 low. The default is bulk.
493
494 bulk In this mode the methods start, add, modes, and rules are
495 used to specify the grammar in a bulk manner, i.e. as a
496 set of nonterminal symbols, and two dictionaries mapping
497 from the symbols to their semantic modes and parsing ex‐
498 pressions.
499
500 This mode is the default.
501
502 incremental
503 In this mode the methods start, add, mode, and rule are
504 used to specify the grammar piecemal, with each nontermi‐
505 nal having its own block of defining commands.
506
507 -template string
508 The value of this option is a string into which to put the gen‐
509 erated code and the other configuration settings. The various
510 locations for user-data are expected to be specified with the
511 placeholders listed below. The default value is "@code@".
512
513 @user@ To be replaced with the value of the option -user.
514
515 @format@
516 To be replaced with the the constant CONTAINER.
517
518 @file@ To be replaced with the value of the option -file.
519
520 @name@ To be replaced with the value of the option -name.
521
522 @mode@ To be replaced with the value of the option -mode.
523
524 @code@ To be replaced with the generated code.
525
527 In this section we are working a complete example, starting with a PEG
528 grammar and ending with running the parser generated from it over some
529 input, following the outline shown in the figure below:
530
531 IMAGE: flow
532
533 Our grammar, assumed to the stored in the file "calculator.peg" is
534
535
536 PEG calculator (Expression)
537 Digit <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9' ;
538 Sign <- '-' / '+' ;
539 Number <- Sign? Digit+ ;
540 Expression <- Term (AddOp Term)* ;
541 MulOp <- '*' / '/' ;
542 Term <- Factor (MulOp Factor)* ;
543 AddOp <- '+'/'-' ;
544 Factor <- '(' Expression ')' / Number ;
545 END;
546
547 From this we create a snit-based parser via
548
549
550 pt generate snit calculator.tcl -class calculator -name calculator peg calculator.peg
551
552 which leaves us with the parser package and class written to the file
553 "calculator.tcl". Assuming that this package is then properly in‐
554 stalled in a place where Tcl can find it we can now use this class via
555 a script like
556
557
558 package require calculator
559
560 lassign $argv input
561 set channel [open $input r]
562
563 set parser [calculator]
564 set ast [$parser parse $channel]
565 $parser destroy
566 close $channel
567
568 ... now process the returned abstract syntax tree ...
569
570 where the abstract syntax tree stored in the variable will look like
571
572 set ast {Expression 0 4
573 {Factor 0 4
574 {Term 0 2
575 {Number 0 2
576 {Digit 0 0}
577 {Digit 1 1}
578 {Digit 2 2}
579 }
580 }
581 {AddOp 3 3}
582 {Term 4 4
583 {Number 4 4
584 {Digit 4 4}
585 }
586 }
587 }
588 }
589
590
591 assuming that the input file and channel contained the text
592
593 120+5
594 A more graphical representation of the tree would be
595
596 .nf +- Digit 0 0 | 1 | | +- Term 0 2 --- Number 0 2 -+-
597 Digit 1 1 | 2 | | | |
598 +- Digit 2 2 | 0 | | Expression
599 0 4 --- Factor 0 4 -+----------------------------- AddOp 3 3 | + |
600 | +- Term 4 4 --- Number 4 4 --- Digit 4 4 | 5 .fi
601
602 Regardless, at this point it is the user's responsibility to work with
603 the tree to reach whatever goal she desires. I.e. analyze it, transform
604 it, etc. The package pt::ast should be of help here, providing commands
605 to walk such ASTs structures in various ways.
606
607 One important thing to note is that the parsers used here return a data
608 structure representing the structure of the input per the grammar un‐
609 derlying the parser. There are no callbacks during the parsing process,
610 i.e. no parsing actions, as most other parsers will have.
611
612 Going back to the last snippet of code, the execution of the parser for
613 some input, note how the parser instance follows the specified Parser
614 API.
615
617 This section is intended for users of the application which wish to
618 modify or extend it. Users only interested in the generation of parsers
619 can ignore it.
620
621 The main functionality of the application is encapsulated in the pack‐
622 age pt::pgen. Please read it for more information.
623
625 This document, and the package it describes, will undoubtedly contain
626 bugs and other problems. Please report such in the category pt of the
627 Tcllib Trackers [http://core.tcl.tk/tcllib/reportlist]. Please also
628 report any ideas for enhancements you may have for either package
629 and/or documentation.
630
631 When proposing code changes, please provide unified diffs, i.e the out‐
632 put of diff -u.
633
634 Note further that attachments are strongly preferred over inlined
635 patches. Attachments can be made by going to the Edit form of the
636 ticket immediately after its creation, and then using the left-most
637 button in the secondary navigation bar.
638
640 EBNF, LL(k), PEG, TDPL, context-free languages, expression, grammar,
641 matching, parser, parsing expression, parsing expression grammar, push
642 down automaton, recursive descent, state, top-down parsing languages,
643 transducer
644
646 Parsing and Grammars
647
649 Copyright (c) 2009 Andreas Kupries <andreas_kupries@users.sourceforge.net>
650
651
652
653
654tcllib 1 pt(n)