1page(n)                        Development Tools                       page(n)
2
3
4
5______________________________________________________________________________
6

NAME

8       page - Parser Generator
9

SYNOPSIS

11       page ?options...? ?input ?output??
12
13______________________________________________________________________________
14

DESCRIPTION

16       The  application described by this document, page, is actually not just
17       a parser generator, as the name implies, but a  generic  tool  for  the
18       execution of arbitrary transformations on texts.
19
20       Its genericity comes through the use of plugins for reading, transform‐
21       ing, and writing data, and the predefined set of  plugins  provided  by
22       Tcllib  is  for  the  generation of memoizing recursive descent parsers
23       (aka packrat parsers) from grammar specifications  (Parsing  Expression
24       Grammars).
25
26       page  is  written  on  top of the package page::pluginmgr, wrapping its
27       functionality into a command line  based  application.  All  the  other
28       page::*  packages are plugin and/or supporting packages for the genera‐
29       tion of parsers. The parsers themselves are based on the packages gram‐
30       mar::peg, grammar::peg::interp, and grammar::mengine.
31
32   COMMAND LINE
33       page ?options...? ?input ?output??
34              This is general form for calling page. The application will read
35              the contents of the file input, process them under  the  control
36              of  the specified options, and then write the result to the file
37              output.
38
39              If input is the string - the data to process will be  read  from
40              stdin  instead of a file. Analogously the result will be written
41              to stdout instead of a file if output is the string -. A missing
42              output  or  input specification causes the application to assume
43              -.
44
45              The detailed specifications of the recognized options  are  pro‐
46              vided in section OPTIONS.
47
48              path input (in)
49                     This  argument  specifies the path to the file to be pro‐
50                     cessed by the application, or -. The  last  value  causes
51                     the application to read the text from stdin. Otherwise it
52                     has to exist, and be readable. If the argument is missing
53                     - is assumed.
54
55              path output (in)
56                     This  argument  specifies  where  to  write the generated
57                     text. It can be the path to a file, or -. The last  value
58                     causes  the application to write the generated documented
59                     to stdout.
60
61                     If the file output does  not  exist  then  [file  dirname
62                     $output]  has  to exist and must be a writable directory,
63                     as the application will create the fileto write to.
64
65                     If the argument is missing - is assumed.
66
67   OPERATION
68       ... reading ... transforming ... writing - plugins - pipeline ...
69
70   OPTIONS
71       This section describes all the options available to  the  user  of  the
72       application. Options are always processed in order. I.e. of both --help
73       and --version are specified the option  encountered  first  has  prece‐
74       dence.
75
76       Unknown  options  specified  before any of the options -rd, -wr, or -tr
77       will cause processing to abort with an error. Unknown options coming in
78       between  these options, or after the last of them are assumed to always
79       take a single argument and are associated with the last  plugin  option
80       coming  before  them. They will be checked after all the relevant plug‐
81       ins, and thus the options they understand, are known. I.e. such unknown
82       options  cause  error if and only if the plugin option they are associ‐
83       ated with does not understand them, and was not superceded by a  plugin
84       option coming after.
85
86       Default  options  are used if and only if the command line did not con‐
87       tain any options at all. They will set the application  up  as  a  PEG-
88       based parser generator. The exact list of options is
89
90              -c peg
91
92       And now the recognized options and their arguments, if they have any:
93
94       --help
95
96       -h
97
98       -?     When one of these options is found on the command line all argu‐
99              ments coming before or after are ignored. The  application  will
100              print a short description of the recognized options and exit.
101
102       --version
103
104       -V     When one of these options is found on the command line all argu‐
105              ments coming before or after are ignored. The  application  will
106              print its own revision and exit.
107
108       -P     This  option signals the application to activate visual feedback
109              while reading the input.
110
111       -T     This option signals the application to collect statistics  while
112              reading the input and to print them after reading has completed,
113              before processing started.
114
115       -D     This option signals the application to activate logging  in  the
116              Safe base, for the debugging of problems with plugins.
117
118       -r parser
119
120       -rd parser
121
122       --reader parser
123              These  options specify the plugin the application has to use for
124              reading the input. If the options are used  multiple  times  the
125              last one will be used.
126
127       -w generator
128
129       -wr generator
130
131       --writer generator
132              These  options specify the plugin the application has to use for
133              generating and writing the final output. If the options are used
134              multiple times the last one will be used.
135
136       -t process
137
138       -tr process
139
140       --transform process
141              These  options specify a plugin to run on the input. In contrast
142              to readers and writers each  use  will  not  supersede  previous
143              uses,  but  add each chosen plugin to a list of transformations,
144              either at the front, or the end, per the last seen use of either
145              option -p or -a. The initial default is to append the new trans‐
146              formations.
147
148       -a
149
150       --append
151              These options signal the application that all  following  trans‐
152              formations should be added at the end of the list of transforma‐
153              tions.
154
155       -p
156
157       --prepend
158              These options signal the application that all  following  trans‐
159              formations  should  be  added  at  the  beginning of the list of
160              transformations.
161
162       --reset
163              This option signals the application to clear the list of  trans‐
164              formations.  This is necessary to wipe out the default transfor‐
165              mations used.
166
167       -c file
168
169       --configuration file
170              This option causes the application to load a configuration  file
171              and/or plugin. This is a plugin which in essence provides a pre-
172              defined set of commandline options. They are  processed  exactly
173              as  if  they  have been specified in place of the option and its
174              arguments. This means that unknown options found at  the  begin‐
175              ning  of  the  configuration  file  are associated with the last
176              plugin, even if that plugin was specified before the  configura‐
177              tion  file  itself. Conversely, unknown options coming after the
178              configuration file can be associated with a plugin specified  in
179              the file.
180
181              If the argument is a file which cannot be loaded as a plugin the
182              application will assume that its contents are a list of  options
183              and  their  arguments,  separated  by space, tabs, and newlines.
184              Options and argumentes containing spaces can be quoted via  dou‐
185              ble-quotes (") and quotes ('). The quote character can be speci‐
186              fied within in a quoted string by doubling  it.  Newlines  in  a
187              quoted string are accepted as is.
188
189   PLUGINS
190       page  makes  use  of  four different types of plugins, namely: readers,
191       writers, transformations, and configurations. Here we  provide  only  a
192       basic  introduction  on  how to use them from page. The exact APIs pro‐
193       vided to and expected from the plugins can be found in  the  documenta‐
194       tion  for  page::pluginmgr, for those who wish to write their own plug‐
195       ins.
196
197       Plugins are specified as arguments to the options -r, -w, -t,  -c,  and
198       their equivalent longer forms. See the section OPTIONS for reference.
199
200       Each such argument will be first treated as the name of a file and this
201       file is loaded as the plugin. If however there is  no  file  with  that
202       name,  then  it will be translated into the name of a package, and this
203       package is then loaded. For each type of plugins the package management
204       searches  not  only the regular paths, but a set application- and type-
205       specific paths as well. Please see the section PLUGIN LOCATIONS  for  a
206       listing of all paths and their sources.
207
208       -c name
209              Configurations.  The  name of the package for the plugin name is
210              "page::config::name".
211
212              We have one predefined plugin:
213
214              peg    It sets the application up as a parser generator  accept‐
215                     ing  parsing  expression  grammars  and writing a packrat
216                     parser in Tcl. The actual arguments it specifies are:
217
218
219
220                       --reset
221                       --append
222                       --reader    peg
223                       --transform reach
224                       --transform use
225                       --writer    me
226
227
228
229       -r name
230              Readers. The  name  of  the  package  for  the  plugin  name  is
231              "page::reader::name".
232
233              We have five predefined plugins:
234
235              peg    Interprets  the  input  as  a  parsing expression grammar
236                     (PEG) and generates a tree representation  for  it.  Both
237                     the  syntax  of PEGs and the structure of the tree repre‐
238                     sentation are explained in their own manpages.
239
240              hb     Interprets the input as Tcl  code  as  generated  by  the
241                     writer plugin hb and generates its tree representation.
242
243              ser    Interprets  the  input  as the serialization of a PEG, as
244                     generated by the writer plugin  ser,  using  the  package
245                     grammar::peg.
246
247              lemon  Interprets the input as a grammar specification as under‐
248                     stood by Richard Hipp's LEMON parser generator and gener‐
249                     ates  a tree representation for it. Both the input syntax
250                     and  the  structure  of  the  tree   representation   are
251                     explained in their own manpages.
252
253              treeser
254                     Interprets   the   input   as   the  serialization  of  a
255                     struct::tree. It is validated as such, but nothing  else.
256                     It  is  not  assumed  to  be the tree representation of a
257                     grammar.
258
259       -w name
260              Writers. The  name  of  the  package  for  the  plugin  name  is
261              "page::writer::name".
262
263              We have eight predefined plugins:
264
265              identity
266                     Simply  writes the incoming data as it is, without making
267                     any changes. This is good for inspecting the  raw  result
268                     of a reader or transformation.
269
270              null   Generates  nothing,  and ignores the incoming data struc‐
271                     ture.
272
273              tree   Assumes  that  the   incoming   data   structure   is   a
274                     struct::tree  and generates an indented textual represen‐
275                     tation of all nodes, their  parental  relationships,  and
276                     their attribute information.
277
278              peg    Assumes that the incoming data structure is a tree repre‐
279                     sentation of a PEG or other other grammar and  writes  it
280                     out  as  a  PEG.  The result is nicely formatted and par‐
281                     tially simplified (strings as sequences of characters). A
282                     pretty printer in essence, but can also be used to obtain
283                     a canonical representation of the input grammar.
284
285              tpc    Assumes that the incoming data structure is a tree repre‐
286                     sentation  of a PEG or other other grammar and writes out
287                     Tcl code defining a package which defines a  grammar::peg
288                     object  containing  the grammar when it is loaded into an
289                     interpreter.
290
291              hb     This is like the writer plugin tpc, but  it  writes  only
292                     the  statements  which define stat expression and grammar
293                     rules. The code making the result a package is left out.
294
295              ser    Assumes that the incoming data structure is a tree repre‐
296                     sentation  of a PEG or other other grammar, transforms it
297                     internally into a grammar::peg object and writes out  its
298                     serialization.
299
300              me     Assumes that the incoming data structure is a tree repre‐
301                     sentation of a PEG or other other grammar and writes  out
302                     Tcl  code defining a package which implements a memoizing
303                     recursive descent parser based on the match  engine  (ME)
304                     provided by the package grammar::mengine.
305
306       -t name
307              Transformers.  The  name  of  the package for the plugin name is
308              "page::transform::name".
309
310              We have two predefined plugins:
311
312              reach  Assumes that the incoming data structure is a tree repre‐
313                     sentation  of a PEG or other other grammar. It determines
314                     which nonterminal symbols and rules  are  reachable  from
315                     start-symbol/expression.  All  nonterminal  symbols which
316                     were not reached are removed.
317
318              use    Assumes that the incoming data structure is a tree repre‐
319                     sentation  of a PEG or other other grammar. It determines
320                     which nonterminal symbols and rules are able to  generate
321                     a  finite sequences of terminal symbols (in the sense for
322                     a Context Free Grammar). All  nonterminal  symbols  which
323                     were not deemed useful in this sense are removed.
324
325   PLUGIN LOCATIONS
326       The  application-specific  paths  searched  by page either are, or come
327       from:
328
329       [1]    The directory            "~/.page/plugin"
330
331       [2]    The environment variable PAGE_PLUGINS
332
333       [3]    The registry entry        HKEY_LOCAL_MACHINE\SOFTWARE\PAGE\PLUG‐
334              INS
335
336       [4]    The registry entry       HKEY_CURRENT_USER\SOFTWARE\PAGE\PLUGINS
337
338       The type-specific paths searched by page either are, or come from:
339
340       [1]    The directory            "~/.page/plugin/<TYPE>"
341
342       [2]    The environment variable PAGE_<TYPE>_PLUGINS
343
344       [3]    The      registry      entry            HKEY_LOCAL_MACHINE\SOFT‐
345              WARE\PAGE\<TYPE>\PLUGINS
346
347       [4]    The      registry      entry             HKEY_CURRENT_USER\SOFT‐
348              WARE\PAGE\<TYPE>\PLUGINS
349
350       Where  the  placeholder <TYPE> is always one of the values below, prop‐
351       erly capitalized.
352
353       [1]    reader
354
355       [2]    writer
356
357       [3]    transform
358
359       [4]    config
360
361       The registry entries are specific  to  the  Windows(tm)  platform,  all
362       other platforms will ignore them.
363
364       The  contents  of  both  environment variables and registry entries are
365       interpreted as a list of paths, with the elements separated  by  either
366       colon (Unix), or semicolon (Windows).
367

BUGS, IDEAS, FEEDBACK

369       This  document,  and the package it describes, will undoubtedly contain
370       bugs and other problems.  Please report such in the  category  page  of
371       the  Tcllib  Trackers  [http://core.tcl.tk/tcllib/reportlist].   Please
372       also report any ideas for enhancements you may have for either  package
373       and/or documentation.
374
375       When proposing code changes, please provide unified diffs, i.e the out‐
376       put of diff -u.
377
378       Note further that  attachments  are  strongly  preferred  over  inlined
379       patches.  Attachments  can  be  made  by  going to the Edit form of the
380       ticket immediately after its creation, and  then  using  the  left-most
381       button in the secondary navigation bar.
382

SEE ALSO

384       page::pluginmgr
385

KEYWORDS

387       parser generator, text processing
388

CATEGORY

390       Page Parser Generator
391
393       Copyright (c) 2005 Andreas Kupries <andreas_kupries@users.sourceforge.net>
394
395
396
397
398tcllib                                1.0                              page(n)
Impressum