1pt(n)                            Parser Tools                            pt(n)
2
3
4
5______________________________________________________________________________
6

NAME

8       pt - Parser Tools Application
9

SYNOPSIS

11       package require Tcl  8.5
12
13       pt generate resultformat ?options...? resultfile inputformat inputfile
14
15______________________________________________________________________________
16

DESCRIPTION

18       Are  you  lost ?  Do you have trouble understanding this document ?  In
19       that case please read the overview  provided  by  the  Introduction  to
20       Parser  Tools.  This document is the entrypoint to the whole system the
21       current package is a part of.
22
23       This document describes pt, the  main  application  of  the  module,  a
24       parser generator. Its intended audience are people who wish to create a
25       parser for some language of theirs.  Should  you  wish  to  modify  the
26       application  instead,  please  see  the section about the application's
27       Internals for the basic references.
28
29       It resides in the User Application Layer of Parser Tools.
30
31       IMAGE: arch_user_app
32

COMMAND LINE

34       pt generate resultformat ?options...? resultfile inputformat inputfile
35              This sub-command of the application reads the parsing expression
36              grammar  stored in the inputfile in the format inputformat, con‐
37              verts it to the resultformat under the direction of the (format-
38              specific)  set  of  options specified by the user and stores the
39              result in the resultfile.
40
41              The inputfile has to exist, while the resultfile may be created,
42              overwriting  any  pre-existing  content of the file. Any missing
43              directory in the path to the resultfile will be created as well.
44
45              The exact form of the result for, and the set  of  options  sup‐
46              ported  by the known result-formats, are explained in the upcom‐
47              ing sections of this document, with the list below providing  an
48              index mapping between format name and its associated section. In
49              alphabetical order:
50
51
52              c      A resultformat. See section C Parser.
53
54              container
55                     A resultformat. See section Grammar Container.
56
57              critcl A resultformat. See section C Parser Embedded In Tcl.
58
59              json   A input-  and  resultformat.  See  section  JSON  Grammar
60                     Exchange.
61
62              oo     A resultformat. See section TclOO Parser.
63
64              peg    A  input- and resultformat. See section PEG Specification
65                     Language.
66
67              snit   A resultformat. See section Snit Parser.
68
69       Of the seven possible results four are parsers outright (c, critcl, oo,
70       and  snit), one (container) provides code which can be used in conjunc‐
71       tion with a generic parser (also known as a grammar  interpreter),  and
72       the  last  two  (json  and peg) are doing double-duty as input formats,
73       allowing the transformation of grammars for exchange, reformatting, and
74       the like.
75
76       The created parsers fall into three categories:
77
78       .nf  + --- C ---> critcl, c | + --- specialized -+ |                  |
79       ---+                  + --- Tcl -> snit, oo | + ---  interpreted  (Tcl)
80       ------> container .fi
81
82       Specialized parsers implemented in C
83              The  fastest parsers are created when using the result formats c
84              and critcl. The first returns the raw C  code  for  the  parser,
85              while the latter wraps it into a Tcl package using CriTcl.
86
87              This makes the latter much easier to use than the former. On the
88              other hand, the former can be adapted to the users' requirements
89              through  a  multitude of options, allowing for things like usage
90              of the parser outside of a Tcl environment, something the critcl
91              format  doesn't  support. As such the c format is meant for more
92              advanced users, or users with special needs.
93
94              A disadvantage of all the parsers in this section is the need to
95              run  them through a C compiler to make them actually executable.
96              This is not something everyone has the necessary tools for.  The
97              parsers  in  the next section are for people under such restric‐
98              tions.
99
100       Specialized parsers implemented in Tcl
101              As the parsers in this section are implemented in Tcl  they  are
102              quite  a  bit slower than anything from the previous section. On
103              the other hand this allows them to be used in pure-Tcl  environ‐
104              ments,  or  in  environments  which  allow only a limited set of
105              binary packages. In the latter case it will be  advantageous  to
106              lobby  for  the  inclusion of the C-based runtime support (notes
107              below) into the environment to reduce the impact of Tcl's on the
108              speed of these parsers.
109
110              The  relevant  formats  are snit and oo. Both place their result
111              into a Tcl package  containing  a  snit::type,  or  TclOO  class
112              respectively.
113
114              Of  the  supporting  runtime,  which is the package pt::rde, the
115              user has to know nothing but that it does  exist  and  that  the
116              parsers  are  dependent  on it. Knowledge of the API exported by
117              the runtime for the parsers' consumption is not required by  the
118              parsers' users.
119
120       Interpreted parsing implemented in Tcl
121              The  last  category,  grammar interpretation. This means that an
122              interpreter for parsing expression grammars takes  the  descrip‐
123              tion  of  the  grammar to parse input for, and uses it guide the
124              parsing process.  This is the slowest of the available  options,
125              as the interpreter has to continually run through the configured
126              grammar, whereas the specialized parsers of  the  previous  sec‐
127              tions  have  the relevant knowledge about the grammar baked into
128              them.
129
130              The only places where using interpretation make sense  is  where
131              the  grammar  for some input may be changed interactively by the
132              user, as the interpretation allows for  quick  turnaround  after
133              each change, whereas the previous methods require the generation
134              of a whole new parser, which is not as fast.  On the other hand,
135              wherever  the  grammar to use is fixed, the previous methods are
136              much more advantageous as the time to  generate  the  parser  is
137              minuscule compared to the time the parser code is in use.
138
139              The relevant result format is container.  It (quickly) generates
140              grammar descriptions (instead of a full parser) which match  the
141              API expected by ParserTools' grammar interpreter.  The latter is
142              provided by the package pt::peg::interp.
143
144       All the parsers generated by critcl, snit,  and  oo,  and  the  grammar
145       interpreter  share  a common API for access to the actual parsing func‐
146       tionality, making them all plug-compatible.  It  is  described  in  the
147       Parser API specification document.
148

PEG SPECIFICATION LANGUAGE

150       peg, a language for the specification of parsing expression grammars is
151       meant to be human readable, and writable as well, yet strict enough  to
152       allow  its  processing  by  machine. Like any computer language. It was
153       defined to make writing the specification of a grammar easy,  something
154       the other formats found in the Parser Tools do not lend themselves too.
155
156       For  either  an introduction to or the formal specification of the lan‐
157       guage, please go and read the PEG Language Tutorial.
158
159       When used  as  a  result-format  this  format  supports  the  following
160       options:
161
162       -file string
163              The value of this option is the name of the file or other entity
164              from which the grammar came, for which the command is  run.  The
165              default value is unknown.
166
167       -name string
168              The  value of this option is the name of the grammar we are pro‐
169              cessing.  The default value is a_pe_grammar.
170
171       -user string
172              The value of this option is the name of the user for  which  the
173              command is run. The default value is unknown.
174
175       -template string
176              The  value of this option is a string into which to put the gen‐
177              erated text and the values of the  other  options.  The  various
178              locations  for  user-data  are expected to be specified with the
179              placeholders listed below. The default value is "@code@".
180
181              @user@ To be replaced with the value of the option -user.
182
183              @format@
184                     To be replaced with the the constant PEG.
185
186              @file@ To be replaced with the value of the option -file.
187
188              @name@ To be replaced with the value of the option -name.
189
190              @code@ To be replaced with the generated text.
191

JSON GRAMMAR EXCHANGE

193       The json format for parsing expression grammars was written as  a  data
194       exchange  format not bound to Tcl. It was defined to allow the exchange
195       of grammars with PackRat/PEG based parser  generators  for  other  lan‐
196       guages.
197
198       For  the  formal  specification  of  the  JSON grammar exchange format,
199       please go and read The JSON Grammar Exchange Format.
200
201       When used  as  a  result-format  this  format  supports  the  following
202       options:
203
204       -file string
205              The value of this option is the name of the file or other entity
206              from which the grammar came, for which the command is  run.  The
207              default value is unknown.
208
209       -name string
210              The  value of this option is the name of the grammar we are pro‐
211              cessing.  The default value is a_pe_grammar.
212
213       -user string
214              The value of this option is the name of the user for  which  the
215              command is run. The default value is unknown.
216
217       -indented boolean
218              If  this  option is set the system will break the generated JSON
219              across lines and indent it according  to  its  inner  structure,
220              with each key of a dictionary on a separate line.
221
222              If  the  option  is not set (the default), the whole JSON object
223              will be written on a single line, with minimum  spacing  between
224              all elements.
225
226       -aligned boolean
227              If this option is set the system will ensure that the values for
228              the keys in a dictionary are vertically aligned with each other,
229              for  a  nice  table effect.  To make this work this also implies
230              that -indented is set.
231
232              If the option is not set (the default), the output is  formatted
233              as per the value of indented, without trying to align the values
234              for dictionary keys.
235

C PARSER EMBEDDED IN TCL

237       The critcl format is executable code, a parser for the grammar. It is a
238       Tcl  package  with  the  actual  parser implementation written in C and
239       embedded in Tcl via the critcl package.
240
241       This result-format supports the following options:
242
243       -file string
244              The value of this option is the name of the file or other entity
245              from  which  the grammar came, for which the command is run. The
246              default value is unknown.
247
248       -name string
249              The value of this option is the name of the grammar we are  pro‐
250              cessing.  The default value is a_pe_grammar.
251
252       -user string
253              The  value  of this option is the name of the user for which the
254              command is run. The default value is unknown.
255
256       -class string
257              The value of this option is the name of the class  to  generate,
258              without leading colons.  The default value is CLASS.
259
260              For a simple value X without colons, like CLASS, the parser com‐
261              mand will be X::X. Whereas  for  a  namespaced  value  X::Y  the
262              parser command will be X::Y.
263
264       -package string
265              The value of this option is the name of the package to generate.
266              The default value is PACKAGE.
267
268       -version string
269              The value of this option is the version of the package to gener‐
270              ate.  The default value is 1.
271

C PARSER

273       The  c  format is executable code, a parser for the grammar. The parser
274       implementation is written in C and can be tweaked to the  users'  needs
275       through a multitude of options.
276
277       The  critcl  format, for example, is implemented as a canned configura‐
278       tion of these options on top of the generator for c.
279
280       This result-format supports the following options:
281
282       -file string
283              The value of this option is the name of the file or other entity
284              from  which  the grammar came, for which the command is run. The
285              default value is unknown.
286
287       -name string
288              The value of this option is the name of the grammar we are  pro‐
289              cessing.  The default value is a_pe_grammar.
290
291       -user string
292              The  value  of this option is the name of the user for which the
293              command is run. The default value is unknown.
294
295       -template string
296              The value of this option is a string into which to put the  gen‐
297              erated  text  and  the other configuration settings. The various
298              locations for user-data are expected to be  specified  with  the
299              placeholders listed below. The default value is "@code@".
300
301              @user@ To be replaced with the value of the option -user.
302
303              @format@
304                     To be replaced with the the constant C/PARAM.
305
306              @file@ To be replaced with the value of the option -file.
307
308              @name@ To be replaced with the value of the option -name.
309
310              @code@ To be replaced with the generated Tcl code.
311
312              The  following  options  are  special,  in  that they will occur
313              within the generated code, and are replaced there as well.
314
315              @statedecl@
316                     To be replaced with the value of the option state-decl.
317
318              @stateref@
319                     To be replaced with the value of the option state-ref.
320
321              @strings@
322                     To be replaced with the value of the  option  string-var‐
323                     name.
324
325              @self@ To be replaced with the value of the option self-command.
326
327              @def@  To  be  replaced  with the value of the option fun-quali‐
328                     fier.
329
330              @ns@   To be replaced with the value of the option namespace.
331
332              @main@ To be replaced with the value of the option main.
333
334              @prelude@
335                     To be replaced with the value of the option prelude.
336
337       -state-decl string
338              A C string representing the argument declaration to use  in  the
339              generated  parsing  functions  to refer to the parsing state. In
340              essence type and argument name.  The default value is the string
341              RDE_PARAM p.
342
343       -state-ref string
344              A C string representing the argument named used in the generated
345              parsing functions to refer to the parsing  state.   The  default
346              value is the string p.
347
348       -self-command string
349              A  C string representing the reference needed to call the gener‐
350              ated parser function (methods ...) from another parser fonction,
351              per  the  chosen framework (template).  The default value is the
352              empty string.
353
354       -fun-qualifier string
355              A C string containing the attributes to give  to  the  generated
356              functions  (methods  ...),  per the chosen framework (template).
357              The default value is static.
358
359       -namespace string
360              The name of the C namespace the parser functions (methods,  ...)
361              shall  reside  in,  or  a  general prefix to add to the function
362              names.  The default value is the empty string.
363
364       -main string
365              The name of the main function (method, ...) to be called by  the
366              chosen framework (template) to start parsing input.  The default
367              value is __main.
368
369       -string-varname string
370              The name of the variable used for the table of strings  used  by
371              the  generated  parser,  i.e. error messages, symbol names, etc.
372              The default value is p_string.
373
374       -prelude string
375              A snippet of code to be inserted at the head of  each  generated
376              parsing function.  The default value is the empty string.
377
378       -indent integer
379              The  number  of  characters to indent each line of the generated
380              code by.  The default value is 0.
381
382       -comments boolean
383              A flag controlling the generation of  code  comments  containing
384              the  original parsing expression a parsing function is for.  The
385              default value is on.
386

SNIT PARSER

388       The snit format is executable code, a parser for the grammar. It  is  a
389       Tcl  package  holding  a  snit::type, i.e. a class, whose instances are
390       parsers for the input grammar.
391
392       This result-format supports the following options:
393
394       -file string
395              The value of this option is the name of the file or other entity
396              from  which  the grammar came, for which the command is run. The
397              default value is unknown.
398
399       -name string
400              The value of this option is the name of the grammar we are  pro‐
401              cessing.  The default value is a_pe_grammar.
402
403       -user string
404              The  value  of this option is the name of the user for which the
405              command is run. The default value is unknown.
406
407       -class string
408              The value of this option is the name of the class  to  generate,
409              without  leading colons. Note, it serves double-duty as the name
410              of the package to generate too, if option -package is not speci‐
411              fied,  see  below.  The default value is CLASS, applying if nei‐
412              ther option -class nor -package were specified.
413
414       -package string
415              The value of this option is the name of the package to generate,
416              without  leading colons. Note, it serves double-duty as the name
417              of the class to generate too, if option -class is not specified,
418              see  above.   The  default value is PACKAGE, applying if neither
419              option -package nor -class were specified.
420
421       -version string
422              The value of this option is the version of the package to gener‐
423              ate.  The default value is 1.
424

TCLOO PARSER

426       The oo format is executable code, a parser for the grammar. It is a Tcl
427       package holding a TclOO class, whose  instances  are  parsers  for  the
428       input grammar.
429
430       This result-format supports the following options:
431
432       -file string
433              The value of this option is the name of the file or other entity
434              from which the grammar came, for which the command is  run.  The
435              default value is unknown.
436
437       -name string
438              The  value of this option is the name of the grammar we are pro‐
439              cessing.  The default value is a_pe_grammar.
440
441       -user string
442              The value of this option is the name of the user for  which  the
443              command is run. The default value is unknown.
444
445       -class string
446              The  value  of this option is the name of the class to generate,
447              without leading colons. Note, it serves double-duty as the  name
448              of the package to generate too, if option -package is not speci‐
449              fied, see below.  The default value is CLASS, applying  if  nei‐
450              ther option -class nor -package were specified.
451
452       -package string
453              The value of this option is the name of the package to generate,
454              without leading colons. Note, it serves double-duty as the  name
455              of the class to generate too, if option -class is not specified,
456              see above.  The default value is PACKAGE,  applying  if  neither
457              option -package nor -class were specified.
458
459       -version string
460              The value of this option is the version of the package to gener‐
461              ate.  The default value is 1.
462

GRAMMAR CONTAINER

464       The container format is another form of describing  parsing  expression
465       grammars.  While  data in this format is executable it does not consti‐
466       tute a parser for the grammar. It always has to be used in  conjunction
467       with the package pt::peg::interp, a grammar interpreter.
468
469       The  format  represents  grammars  by  a  snit::type, i.e. class, whose
470       instances are API-compatible to the instances of the pt::peg::container
471       package, and which are preloaded with the grammar in question.
472
473       This result-format supports the following options:
474
475       -file string
476              The value of this option is the name of the file or other entity
477              from which the grammar came, for which the command is  run.  The
478              default value is unknown.
479
480       -name string
481              The  value of this option is the name of the grammar we are pro‐
482              cessing.  The default value is a_pe_grammar.
483
484       -user string
485              The value of this option is the name of the user for  which  the
486              command is run. The default value is unknown.
487
488       -mode bulk|incremental
489              The value of this option controls which methods of pt::peg::con‐
490              tainer instances are used to specify the grammar,  i.e.  preload
491              it  into  the  container.  There are two legal values, as listed
492              below. The default is bulk.
493
494              bulk   In this mode the methods start, add, modes, and rules are
495                     used  to  specify the grammar in a bulk manner, i.e. as a
496                     set of nonterminal symbols, and two dictionaries  mapping
497                     from  the  symbols  to  their  semantic modes and parsing
498                     expressions.
499
500                     This mode is the default.
501
502              incremental
503                     In this mode the methods start, add, mode, and  rule  are
504                     used to specify the grammar piecemal, with each nontermi‐
505                     nal having its own block of defining commands.
506
507       -template string
508              The value of this option is a string into which to put the  gen‐
509              erated  code  and  the other configuration settings. The various
510              locations for user-data are expected to be  specified  with  the
511              placeholders listed below. The default value is "@code@".
512
513              @user@ To be replaced with the value of the option -user.
514
515              @format@
516                     To be replaced with the the constant CONTAINER.
517
518              @file@ To be replaced with the value of the option -file.
519
520              @name@ To be replaced with the value of the option -name.
521
522              @mode@ To be replaced with the value of the option -mode.
523
524              @code@ To be replaced with the generated code.
525

EXAMPLE

527       In  this section we are working a complete example, starting with a PEG
528       grammar and ending with running the parser generated from it over  some
529       input, following the outline shown in the figure below:
530
531       IMAGE: flow
532
533       Our grammar, assumed to the stored in the file "calculator.peg" is
534
535
536              PEG calculator (Expression)
537                  Digit      <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9'       ;
538                  Sign       <- '-' / '+'                                     ;
539                  Number     <- Sign? Digit+                                  ;
540                  Expression <- Term (AddOp Term)*                            ;
541                  MulOp      <- '*' / '/'                                     ;
542                  Term       <- Factor (MulOp Factor)*                        ;
543                  AddOp      <- '+'/'-'                                       ;
544                  Factor     <- '(' Expression ')' / Number                   ;
545              END;
546
547       From this we create a snit-based parser via
548
549
550              pt generate snit calculator.tcl -class calculator -name calculator peg calculator.peg
551
552       which  leaves  us with the parser package and class written to the file
553       "calculator.tcl".   Assuming  that  this  package  is   then   properly
554       installed  in  a  place where Tcl can find it we can now use this class
555       via a script like
556
557
558                  package require calculator
559
560                  lassign $argv input
561                  set channel [open $input r]
562
563                  set parser [calculator]
564                  set ast [$parser parse $channel]
565                  $parser destroy
566                  close $channel
567
568                  ... now process the returned abstract syntax tree ...
569
570       where the abstract syntax tree stored in the variable will look like
571
572              set ast {Expression 0 4
573                  {Factor 0 4
574                      {Term 0 2
575                          {Number 0 2
576                              {Digit 0 0}
577                              {Digit 1 1}
578                              {Digit 2 2}
579                          }
580                      }
581                      {AddOp 3 3}
582                      {Term 4 4
583                          {Number 4 4
584                              {Digit 4 4}
585                          }
586                      }
587                  }
588              }
589
590
591       assuming that the input file and channel contained the text
592
593               120+5
594       A more graphical representation of the tree would be
595
596       .nf +- Digit 0 0 | 1 |            | +- Term 0 2  ---  Number  0  2  -+-
597       Digit   1   1   |   2   |                            |             |  |
598       +- Digit 2 2 | 0 |                                        |  Expression
599       0  4  ---  Factor  0  4 -+----------------------------- AddOp 3 3 | + |
600       | +- Term 4 4 --- Number 4 4 --- Digit 4 4 | 5 .fi
601
602       Regardless, at this point it is the user's responsibility to work  with
603       the tree to reach whatever goal she desires. I.e. analyze it, transform
604       it, etc. The package pt::ast should be of help here, providing commands
605       to walk such ASTs structures in various ways.
606
607       One important thing to note is that the parsers used here return a data
608       structure representing the structure  of  the  input  per  the  grammar
609       underlying  the  parser.  There  are  no  callbacks  during the parsing
610       process, i.e. no parsing actions, as most other parsers will have.
611
612       Going back to the last snippet of code, the execution of the parser for
613       some  input,  note how the parser instance follows the specified Parser
614       API.
615

INTERNALS

617       This section is intended for users of the  application  which  wish  to
618       modify or extend it. Users only interested in the generation of parsers
619       can ignore it.
620
621       The main functionality of the application is encapsulated in the  pack‐
622       age pt::pgen. Please read it for more information.
623

BUGS, IDEAS, FEEDBACK

625       This  document,  and the package it describes, will undoubtedly contain
626       bugs and other problems.  Please report such in the category pt of  the
627       Tcllib  Trackers  [http://core.tcl.tk/tcllib/reportlist].   Please also
628       report any ideas for enhancements  you  may  have  for  either  package
629       and/or documentation.
630
631       When proposing code changes, please provide unified diffs, i.e the out‐
632       put of diff -u.
633
634       Note further that  attachments  are  strongly  preferred  over  inlined
635       patches.  Attachments  can  be  made  by  going to the Edit form of the
636       ticket immediately after its creation, and  then  using  the  left-most
637       button in the secondary navigation bar.
638

KEYWORDS

640       EBNF,  LL(k),  PEG,  TDPL, context-free languages, expression, grammar,
641       matching, parser, parsing expression, parsing expression grammar,  push
642       down  automaton,  recursive descent, state, top-down parsing languages,
643       transducer
644

CATEGORY

646       Parsing and Grammars
647
649       Copyright (c) 2009 Andreas Kupries <andreas_kupries@users.sourceforge.net>
650
651
652
653
654tcllib                                 1                                 pt(n)
Impressum