1page(n) Development Tools page(n)
2
3
4
5______________________________________________________________________________
6
8 page - Parser Generator
9
11 page ?options...? ?input ?output??
12
13______________________________________________________________________________
14
16 The application described by this document, page, is actually not just
17 a parser generator, as the name implies, but a generic tool for the ex‐
18 ecution of arbitrary transformations on texts.
19
20 Its genericity comes through the use of plugins for reading, transform‐
21 ing, and writing data, and the predefined set of plugins provided by
22 Tcllib is for the generation of memoizing recursive descent parsers
23 (aka packrat parsers) from grammar specifications (Parsing Expression
24 Grammars).
25
26 page is written on top of the package page::pluginmgr, wrapping its
27 functionality into a command line based application. All the other
28 page::* packages are plugin and/or supporting packages for the genera‐
29 tion of parsers. The parsers themselves are based on the packages gram‐
30 mar::peg, grammar::peg::interp, and grammar::mengine.
31
32 COMMAND LINE
33 page ?options...? ?input ?output??
34 This is general form for calling page. The application will read
35 the contents of the file input, process them under the control
36 of the specified options, and then write the result to the file
37 output.
38
39 If input is the string - the data to process will be read from
40 stdin instead of a file. Analogously the result will be written
41 to stdout instead of a file if output is the string -. A missing
42 output or input specification causes the application to assume
43 -.
44
45 The detailed specifications of the recognized options are pro‐
46 vided in section OPTIONS.
47
48 path input (in)
49 This argument specifies the path to the file to be pro‐
50 cessed by the application, or -. The last value causes
51 the application to read the text from stdin. Otherwise it
52 has to exist, and be readable. If the argument is missing
53 - is assumed.
54
55 path output (in)
56 This argument specifies where to write the generated
57 text. It can be the path to a file, or -. The last value
58 causes the application to write the generated documented
59 to stdout.
60
61 If the file output does not exist then [file dirname
62 $output] has to exist and must be a writable directory,
63 as the application will create the fileto write to.
64
65 If the argument is missing - is assumed.
66
67 OPERATION
68 ... reading ... transforming ... writing - plugins - pipeline ...
69
70 OPTIONS
71 This section describes all the options available to the user of the ap‐
72 plication. Options are always processed in order. I.e. of both --help
73 and --version are specified the option encountered first has prece‐
74 dence.
75
76 Unknown options specified before any of the options -rd, -wr, or -tr
77 will cause processing to abort with an error. Unknown options coming in
78 between these options, or after the last of them are assumed to always
79 take a single argument and are associated with the last plugin option
80 coming before them. They will be checked after all the relevant plug‐
81 ins, and thus the options they understand, are known. I.e. such unknown
82 options cause error if and only if the plugin option they are associ‐
83 ated with does not understand them, and was not superceded by a plugin
84 option coming after.
85
86 Default options are used if and only if the command line did not con‐
87 tain any options at all. They will set the application up as a PEG-
88 based parser generator. The exact list of options is
89
90 -c peg
91
92 And now the recognized options and their arguments, if they have any:
93
94 --help
95
96 -h
97
98 -? When one of these options is found on the command line all argu‐
99 ments coming before or after are ignored. The application will
100 print a short description of the recognized options and exit.
101
102 --version
103
104 -V When one of these options is found on the command line all argu‐
105 ments coming before or after are ignored. The application will
106 print its own revision and exit.
107
108 -P This option signals the application to activate visual feedback
109 while reading the input.
110
111 -T This option signals the application to collect statistics while
112 reading the input and to print them after reading has completed,
113 before processing started.
114
115 -D This option signals the application to activate logging in the
116 Safe base, for the debugging of problems with plugins.
117
118 -r parser
119
120 -rd parser
121
122 --reader parser
123 These options specify the plugin the application has to use for
124 reading the input. If the options are used multiple times the
125 last one will be used.
126
127 -w generator
128
129 -wr generator
130
131 --writer generator
132 These options specify the plugin the application has to use for
133 generating and writing the final output. If the options are used
134 multiple times the last one will be used.
135
136 -t process
137
138 -tr process
139
140 --transform process
141 These options specify a plugin to run on the input. In contrast
142 to readers and writers each use will not supersede previous
143 uses, but add each chosen plugin to a list of transformations,
144 either at the front, or the end, per the last seen use of either
145 option -p or -a. The initial default is to append the new trans‐
146 formations.
147
148 -a
149
150 --append
151 These options signal the application that all following trans‐
152 formations should be added at the end of the list of transforma‐
153 tions.
154
155 -p
156
157 --prepend
158 These options signal the application that all following trans‐
159 formations should be added at the beginning of the list of
160 transformations.
161
162 --reset
163 This option signals the application to clear the list of trans‐
164 formations. This is necessary to wipe out the default transfor‐
165 mations used.
166
167 -c file
168
169 --configuration file
170 This option causes the application to load a configuration file
171 and/or plugin. This is a plugin which in essence provides a pre-
172 defined set of commandline options. They are processed exactly
173 as if they have been specified in place of the option and its
174 arguments. This means that unknown options found at the begin‐
175 ning of the configuration file are associated with the last
176 plugin, even if that plugin was specified before the configura‐
177 tion file itself. Conversely, unknown options coming after the
178 configuration file can be associated with a plugin specified in
179 the file.
180
181 If the argument is a file which cannot be loaded as a plugin the
182 application will assume that its contents are a list of options
183 and their arguments, separated by space, tabs, and newlines. Op‐
184 tions and argumentes containing spaces can be quoted via double-
185 quotes (") and quotes ('). The quote character can be specified
186 within in a quoted string by doubling it. Newlines in a quoted
187 string are accepted as is.
188
189 PLUGINS
190 page makes use of four different types of plugins, namely: readers,
191 writers, transformations, and configurations. Here we provide only a
192 basic introduction on how to use them from page. The exact APIs pro‐
193 vided to and expected from the plugins can be found in the documenta‐
194 tion for page::pluginmgr, for those who wish to write their own plug‐
195 ins.
196
197 Plugins are specified as arguments to the options -r, -w, -t, -c, and
198 their equivalent longer forms. See the section OPTIONS for reference.
199
200 Each such argument will be first treated as the name of a file and this
201 file is loaded as the plugin. If however there is no file with that
202 name, then it will be translated into the name of a package, and this
203 package is then loaded. For each type of plugins the package management
204 searches not only the regular paths, but a set application- and type-
205 specific paths as well. Please see the section PLUGIN LOCATIONS for a
206 listing of all paths and their sources.
207
208 -c name
209 Configurations. The name of the package for the plugin name is
210 "page::config::name".
211
212 We have one predefined plugin:
213
214 peg It sets the application up as a parser generator accept‐
215 ing parsing expression grammars and writing a packrat
216 parser in Tcl. The actual arguments it specifies are:
217
218
219
220 --reset
221 --append
222 --reader peg
223 --transform reach
224 --transform use
225 --writer me
226
227
228
229 -r name
230 Readers. The name of the package for the plugin name is
231 "page::reader::name".
232
233 We have five predefined plugins:
234
235 peg Interprets the input as a parsing expression grammar
236 (PEG) and generates a tree representation for it. Both
237 the syntax of PEGs and the structure of the tree repre‐
238 sentation are explained in their own manpages.
239
240 hb Interprets the input as Tcl code as generated by the
241 writer plugin hb and generates its tree representation.
242
243 ser Interprets the input as the serialization of a PEG, as
244 generated by the writer plugin ser, using the package
245 grammar::peg.
246
247 lemon Interprets the input as a grammar specification as under‐
248 stood by Richard Hipp's LEMON parser generator and gener‐
249 ates a tree representation for it. Both the input syntax
250 and the structure of the tree representation are ex‐
251 plained in their own manpages.
252
253 treeser
254 Interprets the input as the serialization of a
255 struct::tree. It is validated as such, but nothing else.
256 It is not assumed to be the tree representation of a
257 grammar.
258
259 -w name
260 Writers. The name of the package for the plugin name is
261 "page::writer::name".
262
263 We have eight predefined plugins:
264
265 identity
266 Simply writes the incoming data as it is, without making
267 any changes. This is good for inspecting the raw result
268 of a reader or transformation.
269
270 null Generates nothing, and ignores the incoming data struc‐
271 ture.
272
273 tree Assumes that the incoming data structure is a
274 struct::tree and generates an indented textual represen‐
275 tation of all nodes, their parental relationships, and
276 their attribute information.
277
278 peg Assumes that the incoming data structure is a tree repre‐
279 sentation of a PEG or other other grammar and writes it
280 out as a PEG. The result is nicely formatted and par‐
281 tially simplified (strings as sequences of characters). A
282 pretty printer in essence, but can also be used to obtain
283 a canonical representation of the input grammar.
284
285 tpc Assumes that the incoming data structure is a tree repre‐
286 sentation of a PEG or other other grammar and writes out
287 Tcl code defining a package which defines a grammar::peg
288 object containing the grammar when it is loaded into an
289 interpreter.
290
291 hb This is like the writer plugin tpc, but it writes only
292 the statements which define stat expression and grammar
293 rules. The code making the result a package is left out.
294
295 ser Assumes that the incoming data structure is a tree repre‐
296 sentation of a PEG or other other grammar, transforms it
297 internally into a grammar::peg object and writes out its
298 serialization.
299
300 me Assumes that the incoming data structure is a tree repre‐
301 sentation of a PEG or other other grammar and writes out
302 Tcl code defining a package which implements a memoizing
303 recursive descent parser based on the match engine (ME)
304 provided by the package grammar::mengine.
305
306 -t name
307 Transformers. The name of the package for the plugin name is
308 "page::transform::name".
309
310 We have two predefined plugins:
311
312 reach Assumes that the incoming data structure is a tree repre‐
313 sentation of a PEG or other other grammar. It determines
314 which nonterminal symbols and rules are reachable from
315 start-symbol/expression. All nonterminal symbols which
316 were not reached are removed.
317
318 use Assumes that the incoming data structure is a tree repre‐
319 sentation of a PEG or other other grammar. It determines
320 which nonterminal symbols and rules are able to generate
321 a finite sequences of terminal symbols (in the sense for
322 a Context Free Grammar). All nonterminal symbols which
323 were not deemed useful in this sense are removed.
324
325 PLUGIN LOCATIONS
326 The application-specific paths searched by page either are, or come
327 from:
328
329 [1] The directory "~/.page/plugin"
330
331 [2] The environment variable PAGE_PLUGINS
332
333 [3] The registry entry HKEY_LOCAL_MACHINE\SOFTWARE\PAGE\PLUG‐
334 INS
335
336 [4] The registry entry HKEY_CURRENT_USER\SOFTWARE\PAGE\PLUGINS
337
338 The type-specific paths searched by page either are, or come from:
339
340 [1] The directory "~/.page/plugin/<TYPE>"
341
342 [2] The environment variable PAGE_<TYPE>_PLUGINS
343
344 [3] The registry entry HKEY_LOCAL_MACHINE\SOFT‐
345 WARE\PAGE\<TYPE>\PLUGINS
346
347 [4] The registry entry HKEY_CURRENT_USER\SOFT‐
348 WARE\PAGE\<TYPE>\PLUGINS
349
350 Where the placeholder <TYPE> is always one of the values below, prop‐
351 erly capitalized.
352
353 [1] reader
354
355 [2] writer
356
357 [3] transform
358
359 [4] config
360
361 The registry entries are specific to the Windows(tm) platform, all
362 other platforms will ignore them.
363
364 The contents of both environment variables and registry entries are in‐
365 terpreted as a list of paths, with the elements separated by either
366 colon (Unix), or semicolon (Windows).
367
369 This document, and the package it describes, will undoubtedly contain
370 bugs and other problems. Please report such in the category page of
371 the Tcllib Trackers [http://core.tcl.tk/tcllib/reportlist]. Please
372 also report any ideas for enhancements you may have for either package
373 and/or documentation.
374
375 When proposing code changes, please provide unified diffs, i.e the out‐
376 put of diff -u.
377
378 Note further that attachments are strongly preferred over inlined
379 patches. Attachments can be made by going to the Edit form of the
380 ticket immediately after its creation, and then using the left-most
381 button in the secondary navigation bar.
382
384 page::pluginmgr
385
387 parser generator, text processing
388
390 Page Parser Generator
391
393 Copyright (c) 2005 Andreas Kupries <andreas_kupries@users.sourceforge.net>
394
395
396
397
398tcllib 1.0 page(n)