1grammar::fa::op(n) Finite automaton operations and usage grammar::fa::op(n)
2
3
4
5______________________________________________________________________________
6
8 grammar::fa::op - Operations on finite automatons
9
11 package require Tcl 8.4
12
13 package require snit
14
15 package require struct::list
16
17 package require struct::set
18
19 package require grammar::fa::op ?0.4.1?
20
21 ::grammar::fa::op::constructor cmd
22
23 ::grammar::fa::op::reverse fa
24
25 ::grammar::fa::op::complete fa ?sink?
26
27 ::grammar::fa::op::remove_eps fa
28
29 ::grammar::fa::op::trim fa ?what?
30
31 ::grammar::fa::op::determinize fa ?mapvar?
32
33 ::grammar::fa::op::minimize fa ?mapvar?
34
35 ::grammar::fa::op::complement fa
36
37 ::grammar::fa::op::kleene fa
38
39 ::grammar::fa::op::optional fa
40
41 ::grammar::fa::op::union fa fb ?mapvar?
42
43 ::grammar::fa::op::intersect fa fb ?mapvar?
44
45 ::grammar::fa::op::difference fa fb ?mapvar?
46
47 ::grammar::fa::op::concatenate fa fb ?mapvar?
48
49 ::grammar::fa::op::fromRegex fa regex ?over?
50
51 ::grammar::fa::op::toRegexp fa
52
53 ::grammar::fa::op::toRegexp2 fa
54
55 ::grammar::fa::op::toTclRegexp regexp symdict
56
57 ::grammar::fa::op::simplifyRegexp regexp
58
59______________________________________________________________________________
60
62 This package provides a number of complex operations on finite automa‐
63 tons (Short: FA), as provided by the package grammar::fa. The package
64 does not provide the ability to create and/or manipulate such FAs, nor
65 the ability to execute a FA for a stream of symbols. Use the packages
66 grammar::fa and grammar::fa::interpreter for that. Another package
67 related to this is grammar::fa::compiler which turns a FA into an
68 executor class which has the definition of the FA hardwired into it.
69
70 For more information about what a finite automaton is see section
71 FINITE AUTOMATONS in package grammar::fa.
72
74 The package exports the API described here. All commands modify their
75 first argument. I.e. whatever FA they compute is stored back into it.
76 Some of the operations will construct an automaton whose states are all
77 new, but related to the states in the source automaton(s). These opera‐
78 tions take variable names as optional arguments where they will store
79 mappings which describe the relationship(s). The operations can be
80 loosely partitioned into structural and language operations. The latter
81 are defined in terms of the language the automaton(s) accept, whereas
82 the former are defined in terms of the structural properties of the
83 involved automaton(s). Some operations are both. Structure operations
84
85 ::grammar::fa::op::constructor cmd
86 This command has to be called by the user of the package before
87 any other operations is performed, to establish a command which
88 can be used to construct a FA container object. If this is not
89 done several operations will fail as they are unable to con‐
90 struct internal and transient containers to hold state and/or
91 partial results.
92
93 Any container class using this package for complex operations
94 should set its own class command as the constructor. See package
95 grammar::fa for an example.
96
97 ::grammar::fa::op::reverse fa
98 Reverses the fa. This is done by reversing the direction of all
99 transitions and swapping the sets of start and final states. The
100 language of fa changes unpredictably.
101
102 ::grammar::fa::op::complete fa ?sink?
103 Completes the fa complete, but nothing is done if the fa is
104 already complete. This implies that only the first in a series
105 of multiple consecutive complete operations on fa will perform
106 anything. The remainder will be null operations.
107
108 The language of fa is unchanged by this operation.
109
110 This is done by adding a single new state, the sink, and transi‐
111 tions from all other states to that sink for all symbols they
112 have no transitions for. The sink itself is made complete by
113 adding loop transitions for all symbols.
114
115 Note: When a FA has epsilon-transitions transitions over a sym‐
116 bol for a state S can be indirect, i.e. not attached directly to
117 S, but to a state in the epsilon-closure of S. The symbols for
118 such indirect transitions count when computing completeness of a
119 state. In other words, these indirectly reached symbols are not
120 missing.
121
122 The argument sink provides the name for the new state and most
123 not be present in the fa if specified. If the name is not speci‐
124 fied the command will name the state "sinkn", where n is set so
125 that there are no collisions with existing states.
126
127 Note that the sink state is not useful by definition. In other
128 words, while the FA becomes complete, it is also not useful in
129 the strict sense as it has a state from which no final state can
130 be reached.
131
132 ::grammar::fa::op::remove_eps fa
133 Removes all epsilon-transitions from the fa in such a manner the
134 the language of fa is unchanged. However nothing is done if the
135 fa is already epsilon-free. This implies that only the first in
136 a series of multiple consecutive complete operations on fa will
137 perform anything. The remainder will be null operations.
138
139 Note: This operation may cause states to become unreachable or
140 not useful. These states are not removed by this operation. Use
141 ::grammar::fa::op::trim for that instead.
142
143 ::grammar::fa::op::trim fa ?what?
144 Removes unwanted baggage from fa. The legal values for what are
145 listed below. The command defaults to !reachable|!useful if no
146 specific argument was given.
147
148 !reachable
149 Removes all states which are not reachable from a start
150 state.
151
152 !useful
153 Removes all states which are unable to reach a final
154 state.
155
156 !reachable&!useful
157
158 !(reachable|useful)
159 Removes all states which are not reachable from a start
160 state and are unable to reach a final state.
161
162 !reachable|!useful
163
164 !(reachable&useful)
165 Removes all states which are not reachable from a start
166 state or are unable to reach a final state.
167
168
169 ::grammar::fa::op::determinize fa ?mapvar?
170 Makes the fa deterministic without changing the language
171 accepted by the fa. However nothing is done if the fa is already
172 deterministic. This implies that only the first in a series of
173 multiple consecutive complete operations on fa will perform any‐
174 thing. The remainder will be null operations.
175
176 The command will store a dictionary describing the relationship
177 between the new states of the resulting dfa and the states of
178 the input nfa in mapvar, if it has been specified. Keys of the
179 dictionary are the handles for the states of the resulting dfa,
180 values are sets of states from the input nfa.
181
182 Note: An empty dictionary signals that the command was able to
183 make the fa deterministic without performing a full subset con‐
184 struction, just by removing states and shuffling transitions
185 around (As part of making the FA epsilon-free).
186
187 Note: The algorithm fails to make the FA deterministic in the
188 technical sense if the FA has no start state(s), because deter‐
189 minism requires the FA to have exactly one start states. In
190 that situation we make a best effort; and the missing start
191 state will be the only condition preventing the generated result
192 from being deterministic. It should also be noted that in this
193 case the possibilities for trimming states from the FA are also
194 severely reduced as we cannot declare states unreachable.
195
196 ::grammar::fa::op::minimize fa ?mapvar?
197 Creates a FA which accepts the same language as fa, but has a
198 minimal number of states. Uses Brzozowski's method to accomplish
199 this.
200
201 The command will store a dictionary describing the relationship
202 between the new states of the resulting minimal fa and the
203 states of the input fa in mapvar, if it has been specified. Keys
204 of the dictionary are the handles for the states of the result‐
205 ing minimal fa, values are sets of states from the input fa.
206
207 Note: An empty dictionary signals that the command was able to
208 minimize the fa without having to compute new states. This
209 should happen if and only if the input FA was already minimal.
210
211 Note: If the algorithm has no start or final states to work with
212 then the result might be technically minimal, but have a very
213 unexpected structure. It should also be noted that in this case
214 the possibilities for trimming states from the FA are also se‐
215 verely reduced as we cannot declare states unreachable.
216
217 Language operations All operations in this section require that all
218 input FAs have at least one start and at least one final state. Other‐
219 wise the language of the FAs will not be defined, making the operation
220 senseless (as it operates on the languages of the FAs in a defined man‐
221 ner).
222
223 ::grammar::fa::op::complement fa
224 Complements fa. This is possible if and only if fa is complete
225 and deterministic. The resulting FA accepts the complementary
226 language of fa. In other words, all inputs not accepted by the
227 input are accepted by the result, and vice versa.
228
229 The result will have all states and transitions of the input,
230 and different final states.
231
232 ::grammar::fa::op::kleene fa
233 Applies Kleene's closure to fa. The resulting FA accepts all
234 strings S for which we can find a natural number n (0 inclusive)
235 and strings A1 ... An in the language of fa such that S is the
236 concatenation of A1 ... An. In other words, the language of the
237 result is the infinite union over finite length concatenations
238 over the language of fa.
239
240 The result will have all states and transitions of the input,
241 and new start and final states.
242
243 ::grammar::fa::op::optional fa
244 Makes the fa optional. In other words it computes the FA which
245 accepts the language of fa and the empty the word (epsilon) as
246 well.
247
248 The result will have all states and transitions of the input,
249 and new start and final states.
250
251 ::grammar::fa::op::union fa fb ?mapvar?
252 Combines the FAs fa and fb such that the resulting FA accepts
253 the union of the languages of the two FAs.
254
255 The result will have all states and transitions of the two input
256 FAs, and new start and final states. All states of fb which
257 exist in fa as well will be renamed, and the mapvar will contain
258 a mapping from the old states of fb to the new ones, if present.
259
260 It should be noted that the result will be non-deterministic,
261 even if the inputs are deterministic.
262
263 ::grammar::fa::op::intersect fa fb ?mapvar?
264 Combines the FAs fa and fb such that the resulting FA accepts
265 the intersection of the languages of the two FAs. In other
266 words, the result will accept a word if and only if the word is
267 accepted by both fa and fb. The result will be useful, but not
268 necessarily deterministic or minimal.
269
270 The command will store a dictionary describing the relationship
271 between the new states of the resulting fa and the pairs of
272 states of the input FAs in mapvar, if it has been specified.
273 Keys of the dictionary are the handles for the states of the
274 resulting fa, values are pairs of states from the input FAs.
275 Pairs are represented by lists. The first element in each pair
276 will be a state in fa, the second element will be drawn from fb.
277
278 ::grammar::fa::op::difference fa fb ?mapvar?
279 Combines the FAs fa and fb such that the resulting FA accepts
280 the difference of the languages of the two FAs. In other words,
281 the result will accept a word if and only if the word is
282 accepted by fa, but not by fb. This can also be expressed as the
283 intersection of fa with the complement of fb. The result will be
284 useful, but not necessarily deterministic or minimal.
285
286 The command will store a dictionary describing the relationship
287 between the new states of the resulting fa and the pairs of
288 states of the input FAs in mapvar, if it has been specified.
289 Keys of the dictionary are the handles for the states of the
290 resulting fa, values are pairs of states from the input FAs.
291 Pairs are represented by lists. The first element in each pair
292 will be a state in fa, the second element will be drawn from fb.
293
294 ::grammar::fa::op::concatenate fa fb ?mapvar?
295 Combines the FAs fa and fb such that the resulting FA accepts
296 the cross-product of the languages of the two FAs. I.e. a word W
297 will be accepted by the result if there are two words A and B
298 accepted by fa, and fb resp. and W is the concatenation of A and
299 B.
300
301 The result FA will be non-deterministic.
302
303 ::grammar::fa::op::fromRegex fa regex ?over?
304 Generates a non-deterministic FA which accepts the same language
305 as the regular expression regex. If the over is specified it is
306 treated as the set of symbols the regular expression and the au‐
307 tomaton are defined over. The command will compute the set from
308 the "S" constructors in regex when over was not specified. This
309 set is important if and only if the complement operator "!" is
310 used in regex as the complementary language of an FA is quite
311 different for different sets of symbols.
312
313 The regular expression is represented by a nested list, which
314 forms a syntax tree. The following structures are legal:
315
316 {S x} Atomic regular expression. Everything else is constructed
317 from these. Accepts the Symbol "x".
318
319 {. A1 A2 ...}
320 Concatenation operator. Accepts the concatenation of the
321 regular expressions A1, A2, etc.
322
323 Note that this operator accepts zero or more arguments.
324 With zero arguments the represented language is epsilon,
325 the empty word.
326
327 {| A1 A2 ...}
328 Choice operator, also called "Alternative". Accepts all
329 input accepted by at least one of the regular expressions
330 A1, A2, etc. In other words, the union of A1, A2.
331
332 Note that this operator accepts zero or more arguments.
333 With zero arguments the represented language is the empty
334 language, the language without words.
335
336 {& A1 A2 ...}
337 Intersection operator, logical and. Accepts all input
338 accepted which is accepted by all of the regular expres‐
339 sions A1, A2, etc. In other words, the intersection of
340 A1, A2.
341
342 {? A} Optionality operator. Accepts the empty word and anything
343 from the regular expression A.
344
345 {* A} Kleene closure. Accepts the empty word and any finite
346 concatenation of words accepted by the regular expression
347 A.
348
349 {+ A} Positive Kleene closure. Accepts any finite concatenation
350 of words accepted by the regular expression A, but not
351 the empty word.
352
353 {! A} Complement operator. Accepts any word not accepted by the
354 regular expression A. Note that the complement depends on
355 the set of symbol the result should run over. See the
356 discussion of the argument over before.
357
358 ::grammar::fa::op::toRegexp fa
359 This command generates and returns a regular expression which
360 accepts the same language as the finite automaton fa. The regu‐
361 lar expression is in the format as described above, for ::gram‐
362 mar::fa::op::fromRegex.
363
364 ::grammar::fa::op::toRegexp2 fa
365 This command has the same functionality as ::gram‐
366 mar::fa::op::toRegexp, but uses a different algorithm to sim‐
367 plify the generated regular expressions.
368
369 ::grammar::fa::op::toTclRegexp regexp symdict
370 This command generates and returns a regular expression in Tcl
371 syntax for the regular expression regexp, if that is possible.
372 regexp is in the same format as expected by ::gram‐
373 mar::fa::op::fromRegex.
374
375 The command will fail and throw an error if regexp contains com‐
376 plementation and intersection operations.
377
378 The argument symdict is a dictionary mapping symbol names to
379 pairs of syntactic type and Tcl-regexp. If a symbol occurring in
380 the regexp is not listed in this dictionary then single-charac‐
381 ter symbols are considered to designate themselves whereas mul‐
382 tiple-character symbols are considered to be a character class
383 name.
384
385 ::grammar::fa::op::simplifyRegexp regexp
386 This command simplifies a regular expression by applying the
387 following algorithm first to the main expression and then recur‐
388 sively to all sub-expressions:
389
390 [1] Convert the expression into a finite automaton.
391
392 [2] Minimize the automaton.
393
394 [3] Convert the automaton back to a regular expression.
395
396 [4] Choose the shorter of original expression and expression
397 from the previous step.
398
401 This document, and the package it describes, will undoubtedly contain
402 bugs and other problems. Please report such in the category grammar_fa
403 of the Tcllib Trackers [http://core.tcl.tk/tcllib/reportlist]. Please
404 also report any ideas for enhancements you may have for either package
405 and/or documentation.
406
407 When proposing code changes, please provide unified diffs, i.e the out‐
408 put of diff -u.
409
410 Note further that attachments are strongly preferred over inlined
411 patches. Attachments can be made by going to the Edit form of the
412 ticket immediately after its creation, and then using the left-most
413 button in the secondary navigation bar.
414
416 automaton, finite automaton, grammar, parsing, regular expression, reg‐
417 ular grammar, regular languages, state, transducer
418
420 Grammars and finite automata
421
423 Copyright (c) 2004-2008 Andreas Kupries <andreas_kupries@users.sourceforge.net>
424
425
426
427
428tcllib 0.4 grammar::fa::op(n)