1page(n) page(n)
2
3
4
5______________________________________________________________________________
6
8 page - Parser Generator
9
11 page ?options...? ?input ?output??
12
13_________________________________________________________________
14
16 The application described by this document, page, is actually not just
17 a parser generator, as the name implies, but a generic tool for the
18 execution of arbitrary transformations on texts.
19
20 Its genericity comes through the use of plugins for reading, transform‐
21 ing, and writing data, and the predefined set of plugins provided by
22 Tcllib is for the generation of memoizing recursive descent parsers
23 (aka packrat parsers) from grammar specifications (Parsing Expression
24 Grammars).
25
26 page is written on top of the package page::pluginmgr, wrapping its
27 functionality into a command line based application. All the other
28 page::* packages are plugin and/or supporting packages for the genera‐
29 tion of parsers. The parsers themselves are based on the packages gram‐
30 mar::peg, grammar::peg::interp, and grammar::mengine.
31
32 COMMAND LINE
33 page ?options...? ?input ?output??
34 This is general form for calling page. The application will read
35 the contents of the file input, process them under the control
36 of the specified options, and then write the result to the file
37 output.
38
39 If input is the string - the data to process will be read from
40 stdin instead of a file. Analogously the result will be written
41 to stdout instead of a file if output is the string -. A missing
42 output or input specification causes the application to assume
43 -.
44
45 The detailed specifications of the recognized options are pro‐
46 vided in section OPTIONS.
47
48 path input (in)
49 This argument specifies the path to the file to be pro‐
50 cessed by the application, or -. The last value causes
51 the application to read the text from stdin. Otherwise it
52 has to exist, and be readable. If the argument is missing
53 - is assumed.
54
55 path output (in)
56 This argument specifies where to write the generated
57 text. It can be the path to a file, or -. The last value
58 causes the application to write the generated documented
59 to stdout.
60
61 If the file output does not exist then [file dirname
62 $output] has to exist and must be a writable directory,
63 as the application will create the fileto write to.
64
65 If the argument is missing - is assumed.
66
67 OPERATION
68 OPTIONS
69 This section describes all the options available to the user of the
70 application. Options are always processed in order. I.e. of both --help
71 and --version are specified the option encountered first has prece‐
72 dence.
73
74 Unknown options specified before any of the options -rd, -wr, or -tr
75 will cause processing to abort with an error. Unknown options coming in
76 between these options, or after the last of them are assumed to always
77 take a single argument and are associated with the last plugin option
78 coming before them. They will be checked after all the relevant plug‐
79 ins, and thus the options they understand, are known. I.e. such unknown
80 options cause error if and only if the plugin option they are associ‐
81 ated with does not understand them, and was not superceded by a plugin
82 option coming after.
83
84 Default options are used if and only if the command line did not con‐
85 tain any options at all. They will set the application up as a PEG-
86 based parser generator. The exact list of options is
87
88 -c peg
89
90 And now the recognized options and their arguments, if they have any:
91
92 --help
93
94 -h
95
96 -? When one of these options is found on the command line all argu‐
97 ments coming before or after are ignored. The application will
98 print a short description of the recognized options and exit.
99
100 --version
101
102 -V When one of these options is found on the command line all argu‐
103 ments coming before or after are ignored. The application will
104 print its own revision and exit.
105
106 -P This option signals the application to activate visual feedback
107 while reading the input.
108
109 -T This option signals the application to collect statistics while
110 reading the input and to print them after reading has completed,
111 before processing started.
112
113 -D This option signals the application to activate logging in the
114 Safe base, for the debugging of problems with plugins.
115
116 -r parser
117
118 -rd parser
119
120 --reader parser
121 These options specify the plugin the application has to use for
122 reading the input. If the options are used multiple times the
123 last one will be used.
124
125 -w generator
126
127 -wr generator
128
129 --writer generator
130 These options specify the plugin the application has to use for
131 generating and writing the final output. If the options are used
132 multiple times the last one will be used.
133
134 -t process
135
136 -tr process
137
138 --transform process
139 These options specify a plugin to run on the input. In contrast
140 to readers and writers each use will not supersede previous
141 uses, but add each chosen plugin to a list of transformations,
142 either at the front, or the end, per the last seen use of either
143 option -p or -a. The initial default is to append the new trans‐
144 formations.
145
146 -a
147
148 --append
149 These options signal the application that all following trans‐
150 formations should be added at the end of the list of transforma‐
151 tions.
152
153 -p
154
155 --prepend
156 These options signal the application that all following trans‐
157 formations should be added at the beginning of the list of
158 transformations.
159
160 --reset
161 This option signals the application to clear the list of trans‐
162 formations. This is necessary to wipe out the default transfor‐
163 mations used.
164
165 -c file
166
167 --configuration file
168 This option causes the application to load a configuration file
169 and/or plugin. This is a plugin which in essence provides a pre-
170 defined set of commandline options. They are processed exactly
171 as if they have been specified in place of the option and its
172 arguments. This means that unknown options found at the begin‐
173 ning of the configuration file are associated with the last
174 plugin, even if that plugin was specified before the configura‐
175 tion file itself. Conversely, unknown options coming after the
176 configuration file can be associated with a plugin specified in
177 the file.
178
179 If the argument is a file which cannot be loaded as a plugin the
180 application will assume that its contents are a list of options
181 and their arguments, separated by space, tabs, and newlines.
182 Options and argumentes containing spaces can be quoted via dou‐
183 ble-quotes (") and quotes ('). The quote character can be speci‐
184 fied within in a quoted string by doubling it. Newlines in a
185 quoted string are accepted as is.
186
187 PLUGINS
188 page makes use of four different types of plugins, namely: readers,
189 writers, transformations, and configurations. Here we provide only a
190 basic introduction on how to use them from page. The exact APIs pro‐
191 vided to and expected from the plugins can be found in the documenta‐
192 tion for page::pluginmgr, for those who wish to write their own plug‐
193 ins.
194
195 Plugins are specified as arguments to the options -r, -w, -t, -c, and
196 their equivalent longer forms. See the section OPTIONS for reference.
197
198 Each such argument will be first treated as the name of a file and this
199 file is loaded as the plugin. If however there is no file with that
200 name, then it will be translated into the name of a package, and this
201 package is then loaded. For each type of plugins the package management
202 searches not only the regular paths, but a set application- and type-
203 specific paths as well. Please see the section PLUGIN LOCATIONS for a
204 listing of all paths and their sources.
205
206 -c name
207 Configurations. The name of the package for the plugin name is
208 "page::config::name".
209
210 We have one predefined plugin:
211
212 peg It sets the application up as a parser generator accept‐
213 ing parsing expression grammars and writing a packrat
214 parser in Tcl. The actual arguments it specifies are:
215
216
217 --reset
218 --append
219 --reader peg
220 --transform reach
221 --transform use
222 --writer me
223
224
225
226 -r name
227 Readers. The name of the package for the plugin name is
228 "page::reader::name".
229
230 We have five predefined plugins:
231
232 peg Interprets the input as a parsing expression grammar
233 (PEG) and generates a tree representation for it. Both
234 the syntax of PEGs and the structure of the tree repre‐
235 sentation are explained in their own manpages.
236
237 hb Interprets the input as Tcl code as generated by the
238 writer plugin hb and generates its tree representation.
239
240 ser Interprets the input as the serialization of a PEG, as
241 generated by the writer plugin ser, using the package
242 grammar::peg.
243
244 lemon Interprets the input as a grammar specification as under‐
245 stood by Richard Hipp's LEMON parser generator and gener‐
246 ates a tree representation for it. Both the input syntax
247 and the structure of the tree representation are
248 explained in their own manpages.
249
250 treeser
251 Interprets the input as the serialization of a
252 struct::tree. It is validated as such, but nothing else.
253 It is not assumed to be the tree representation of a
254 grammar.
255
256 -w name
257 Writers. The name of the package for the plugin name is
258 "page::writer::name".
259
260 We have eight predefined plugins:
261
262 identity
263 Simply writes the incoming data as it is, without making
264 any changes. This is good for inspecting the raw result
265 of a reader or transformation.
266
267 null Generates nothing, and ignores the incoming data struc‐
268 ture.
269
270 tree Assumes that the incoming data structure is a
271 struct::tree and generates an indented textual represen‐
272 tation of all nodes, their parental relationships, and
273 their attribute information.
274
275 peg Assumes that the incoming data structure is a tree repre‐
276 sentation of a PEG or other other grammar and writes it
277 out as a PEG. The result is nicely formatted and par‐
278 tially simplified (strings as sequences of characters). A
279 pretty printer in essence, but can also be used to obtain
280 a canonical representation of the input grammar.
281
282 tpc Assumes that the incoming data structure is a tree repre‐
283 sentation of a PEG or other other grammar and writes out
284 Tcl code defining a package which defines a grammar::peg
285 object containing the grammar when it is loaded into an
286 interpreter.
287
288 hb This is like the writer plugin tpc, but it writes only
289 the statements which define stat expression and grammar
290 rules. The code making the result a package is left out.
291
292 ser Assumes that the incoming data structure is a tree repre‐
293 sentation of a PEG or other other grammar, transforms it
294 internally into a grammar::peg object and writes out its
295 serialization.
296
297 me Assumes that the incoming data structure is a tree repre‐
298 sentation of a PEG or other other grammar and writes out
299 Tcl code defining a package which implements a memoizing
300 recursive descent parser based on the match engine (ME)
301 provided by the package grammar::mengine.
302
303 -t name
304 Transformers. The name of the package for the plugin name is
305 "page::transform::name".
306
307 We have two predefined plugins:
308
309 reach Assumes that the incoming data structure is a tree repre‐
310 sentation of a PEG or other other grammar. It determines
311 which nonterminal symbols and rules are reachable from
312 start-symbol/expression. All nonterminal symbols which
313 were not reached are removed.
314
315 use Assumes that the incoming data structure is a tree repre‐
316 sentation of a PEG or other other grammar. It determines
317 which nonterminal symbols and rules are able to generate
318 a finite sequences of terminal symbols (in the sense for
319 a Context Free Grammar). All nonterminal symbols which
320 were not deemed useful in this sense are removed.
321
322 PLUGIN LOCATIONS
323 The application-specific paths searched by page either are, or come
324 from:
325
326 [1] The directory "~/.page/plugin"
327
328 [2] The environment variable PAGE_PLUGINS
329
330 [3] The registry entry HKEY_LOCAL_MACHINE\SOFTWARE\PAGE\PLUG‐
331 INS
332
333 [4] The registry entry HKEY_CURRENT_USER\SOFTWARE\PAGE\PLUGINS
334
335 The type-specific paths searched by page either are, or come from:
336
337 [1] The directory "~/.page/plugin/<TYPE>"
338
339 [2] The environment variable PAGE_<TYPE>_PLUGINS
340
341 [3] The registry entry HKEY_LOCAL_MACHINE\SOFT‐
342 WARE\PAGE\<TYPE>\PLUGINS
343
344 [4] The registry entry HKEY_CURRENT_USER\SOFT‐
345 WARE\PAGE\<TYPE>\PLUGINS
346
347 Where the placeholder <TYPE> is always one of the values below, prop‐
348 erly capitalized.
349
350 [1] reader
351
352 [2] writer
353
354 [3] transform
355
356 [4] config
357
358 The registry entries are specific to the Windows(tm) platform, all
359 other platforms will ignore them.
360
361 The contents of both environment variables and registry entries are
362 interpreted as a list of paths, with the elements separated by either
363 colon (Unix), or semicolon (Windows).
364
366 This document, and the application it describes, will undoubtedly con‐
367 tain bugs and other problems. Please report such in the category page
368 of the Tcllib SF Trackers [http://source‐
369 forge.net/tracker/?group_id=12883]. Please also report any ideas for
370 enhancements you may have for either application and/or documentation.
371
373 page::pluginmgr
374
376 parser generator, text processing
377
379 Copyright (c) 2005 Andreas Kupries <andreas_kupries@users.sourceforge.net>
380
381
382
383
384Development Tools 1.0 page(n)