1grammar::fa(n) Finite automaton operations and usage grammar::fa(n)
2
3
4
5______________________________________________________________________________
6
8 grammar::fa - Create and manipulate finite automatons
9
11 package require Tcl 8.4
12
13 package require snit
14
15 package require struct::list
16
17 package require struct::set
18
19 package require grammar::fa::op ?0.2?
20
21 package require grammar::fa ?0.2?
22
23 ::grammar::fa faName ?=|:=|<--|as|deserialize src|fromRegex re ?over??
24
25 faName option ?arg arg ...?
26
27 faName destroy
28
29 faName clear
30
31 faName = srcFA
32
33 faName --> dstFA
34
35 faName serialize
36
37 faName deserialize serialization
38
39 faName states
40
41 faName state add s1 ?s2 ...?
42
43 faName state delete s1 ?s2 ...?
44
45 faName state exists s
46
47 faName state rename s snew
48
49 faName startstates
50
51 faName start add s1 ?s2 ...?
52
53 faName start remove s1 ?s2 ...?
54
55 faName start? s
56
57 faName start?set stateset
58
59 faName finalstates
60
61 faName final add s1 ?s2 ...?
62
63 faName final remove s1 ?s2 ...?
64
65 faName final? s
66
67 faName final?set stateset
68
69 faName symbols ?s?
70
71 faName symbolset stateset
72
73 faName symbol add sym1 ?sym2 ...?
74
75 faName symbol delete sym1 ?sym2 ...?
76
77 faName symbol rename sym newsym
78
79 faName symbol exists sym
80
81 faName next s sym ?--> next?
82
83 faName !next s sym ?--> next?
84
85 faName nextset stateset sym
86
87 faName is deterministic
88
89 faName is complete
90
91 faName is useful
92
93 faName is epsilon-free
94
95 faName reachable_states
96
97 faName unreachable_states
98
99 faName reachable s
100
101 faName useful_states
102
103 faName unuseful_states
104
105 faName useful s
106
107 faName epsilon_closure s
108
109 faName reverse
110
111 faName complete
112
113 faName remove_eps
114
115 faName trim ?what?
116
117 faName determinize ?mapvar?
118
119 faName minimize ?mapvar?
120
121 faName complement
122
123 faName kleene
124
125 faName optional
126
127 faName union fa ?mapvar?
128
129 faName intersect fa ?mapvar?
130
131 faName difference fa ?mapvar?
132
133 faName concatenate fa ?mapvar?
134
135 faName fromRegex regex ?over?
136
137_________________________________________________________________
138
140 This package provides a container class for finite automatons (Short:
141 FA). It allows the incremental definition of the automaton, its manip‐
142 ulation and querying of the definition. While the package provides
143 complex operations on the automaton (via package grammar::fa::op), it
144 does not have the ability to execute a definition for a stream of sym‐
145 bols. Use the packages grammar::fa::dacceptor and grammar::fa::dexec
146 for that. Another package related to this is grammar::fa::compiler. It
147 turns a FA into an executor class which has the definition of the FA
148 hardwired into it. The output of this package is configurable to suit a
149 large number of different implementation languages and paradigms.
150
151 For more information about what a finite automaton is see section
152 FINITE AUTOMATONS.
153
155 The package exports the API described here.
156
157 ::grammar::fa faName ?=|:=|<--|as|deserialize src|fromRegex re ?over??
158 Creates a new finite automaton with an associated global Tcl
159 command whose name is faName. This command may be used to invoke
160 various operations on the automaton. It has the following gen‐
161 eral form:
162
163 faName option ?arg arg ...?
164 Option and the args determine the exact behavior of the
165 command. See section FA METHODS for more explanations.
166 The new automaton will be empty if no src is specified.
167 Otherwise it will contain a copy of the definition con‐
168 tained in the src. The src has to be a FA object refer‐
169 ence for all operators except deserialize and fromRegex.
170 The deserialize operator requires src to be the serial‐
171 ization of a FA instead, and fromRegex takes a regular
172 expression in the form a of a syntax tree. See ::gram‐
173 mar::fa::op::fromRegex for more detail on that.
174
176 All automatons provide the following methods for their manipulation:
177
178 faName destroy
179 Destroys the automaton, including its storage space and associ‐
180 ated command.
181
182 faName clear
183 Clears out the definition of the automaton contained in faName,
184 but does not destroy the object.
185
186 faName = srcFA
187 Assigns the contents of the automaton contained in srcFA to
188 faName, overwriting any existing definition. This is the
189 assignment operator for automatons. It copies the automaton con‐
190 tained in the FA object srcFA over the automaton definition in
191 faName. The old contents of faName are deleted by this opera‐
192 tion.
193
194 This operation is in effect equivalent to
195
196
197 faName deserialize [srcFA serialize]
198
199
200 faName --> dstFA
201 This is the reverse assignment operator for automatons. It
202 copies the automation contained in the object faName over the
203 automaton definition in the object dstFA. The old contents of
204 dstFA are deleted by this operation.
205
206 This operation is in effect equivalent to
207
208
209 dstFA deserialize [faName serialize]
210
211
212 faName serialize
213 This method serializes the automaton stored in faName. In other
214 words it returns a tcl value completely describing that automa‐
215 ton. This allows, for example, the transfer of automatons over
216 arbitrary channels, persistence, etc. This method is also the
217 basis for both the copy constructor and the assignment operator.
218
219 The result of this method has to be semantically identical over
220 all implementations of the grammar::fa interface. This is what
221 will enable us to copy automatons between different implementa‐
222 tions of the same interface.
223
224 The result is a list of three elements with the following struc‐
225 ture:
226
227 [1] The constant string grammar::fa.
228
229 [2] A list containing the names of all known input symbols.
230 The order of elements in this list is not relevant.
231
232 [3] The last item in the list is a dictionary, however the
233 order of the keys is important as well. The keys are the
234 states of the serialized FA, and their order is the order
235 in which to create the states when deserializing. This is
236 relevant to preserve the order relationship between
237 states.
238
239 The value of each dictionary entry is a list of three
240 elements describing the state in more detail.
241
242 [1] A boolean flag. If its value is true then the
243 state is a start state, otherwise it is not.
244
245 [2] A boolean flag. If its value is true then the
246 state is a final state, otherwise it is not.
247
248 [3] The last element is a dictionary describing the
249 transitions for the state. The keys are symbols
250 (or the empty string), and the values are sets of
251 successor states.
252
253 Assuming the following FA (which describes the life of a truck driver
254 in a very simple way :)
255
256
257 Drive -- yellow --> Brake -- red --> (Stop) -- red/yellow --> Attention -- green --> Drive
258 (...) is the start state.
259
260
261 a possible serialization is
262
263
264 grammar::fa \\
265 {yellow red green red/yellow} \\
266 {Drive {0 0 {yellow Brake}} \\
267 Brake {0 0 {red Stop}} \\
268 Stop {1 0 {red/yellow Attention}} \\
269 Attention {0 0 {green Drive}}}
270
271
272 A possible one, because I did not care about creation order here
273
274 faName deserialize serialization
275 This is the complement to serialize. It replaces the automaton
276 definition in faName with the automaton described by the serial‐
277 ization value. The old contents of faName are deleted by this
278 operation.
279
280 faName states
281 Returns the set of all states known to faName.
282
283 faName state add s1 ?s2 ...?
284 Adds the states s1, s2, et cetera to the FA definition in
285 faName. The operation will fail any of the new states is already
286 declared.
287
288 faName state delete s1 ?s2 ...?
289 Deletes the state s1, s2, et cetera, and all associated informa‐
290 tion from the FA definition in faName. The latter means that the
291 information about in- or outbound transitions is deleted as
292 well. If the deleted state was a start or final state then this
293 information is invalidated as well. The operation will fail if
294 the state s is not known to the FA.
295
296 faName state exists s
297 A predicate. It tests whether the state s is known to the FA in
298 faName. The result is a boolean value. It will be set to true
299 if the state s is known, and false otherwise.
300
301 faName state rename s snew
302 Renames the state s to snew. Fails if s is not a known state.
303 Also fails if snew is already known as a state.
304
305 faName startstates
306 Returns the set of states which are marked as start states, also
307 known as initial states. See FINITE AUTOMATONS for explanations
308 what this means.
309
310 faName start add s1 ?s2 ...?
311 Mark the states s1, s2, et cetera in the FA faName as start (aka
312 initial).
313
314 faName start remove s1 ?s2 ...?
315 Mark the states s1, s2, et cetera in the FA faName as not start
316 (aka not accepting).
317
318 faName start? s
319 A predicate. It tests if the state s in the FA faName is start
320 or not. The result is a boolean value. It will be set to true
321 if the state s is start, and false otherwise.
322
323 faName start?set stateset
324 A predicate. It tests if the set of states stateset contains at
325 least one start state. They operation will fail if the set con‐
326 tains an element which is not a known state. The result is a
327 boolean value. It will be set to true if a start state is
328 present in stateset, and false otherwise.
329
330 faName finalstates
331 Returns the set of states which are marked as final states, also
332 known as accepting states. See FINITE AUTOMATONS for explana‐
333 tions what this means.
334
335 faName final add s1 ?s2 ...?
336 Mark the states s1, s2, et cetera in the FA faName as final (aka
337 accepting).
338
339 faName final remove s1 ?s2 ...?
340 Mark the states s1, s2, et cetera in the FA faName as not final
341 (aka not accepting).
342
343 faName final? s
344 A predicate. It tests if the state s in the FA faName is final
345 or not. The result is a boolean value. It will be set to true
346 if the state s is final, and false otherwise.
347
348 faName final?set stateset
349 A predicate. It tests if the set of states stateset contains at
350 least one final state. They operation will fail if the set con‐
351 tains an element which is not a known state. The result is a
352 boolean value. It will be set to true if a final state is
353 present in stateset, and false otherwise.
354
355 faName symbols ?s?
356 If no state s is specified then it returns the set of all sym‐
357 bols known to the FA faName. Otherwise it returns the set of all
358 symbols for which the state s has transitions. If the empty sym‐
359 bol is present then s has epsilon transitions.
360
361 faName symbolset stateset
362 Returns the set of all symbols for which at least one state in
363 the set of states stateset has transitions. In other words, the
364 union of [faName symbols s] for all states s in stateset. If
365 the empty symbol is present then at least one state contained in
366 stateset has epsilon transitions.
367
368 faName symbol add sym1 ?sym2 ...?
369 Adds the symbols sym1, sym2, et cetera to the FA definition in
370 faName. The operation will fail any of the symbols is already
371 declared. The empty string is not allowed as a value for the
372 symbols.
373
374 faName symbol delete sym1 ?sym2 ...?
375 Deletes the symbols sym1, sym2 et cetera, and all associated
376 information from the FA definition in faName. The latter means
377 that all transitions using the symbols are deleted as well. The
378 operation will fail if any of the symbols is not known to the
379 FA.
380
381 faName symbol rename sym newsym
382 Renames the symbol sym to newsym. Fails if sym is not a known
383 symbol. Also fails if newsym is already known as a symbol.
384
385 faName symbol exists sym
386 A predicate. It tests whether the symbol sym is known to the FA
387 in faName. The result is a boolean value. It will be set to
388 true if the symbol sym is known, and false otherwise.
389
390 faName next s sym ?--> next?
391 Define or query transition information.
392
393 If next is specified, then the method will add a transition from
394 the state s to the successor state next labeled with the symbol
395 sym to the FA contained in faName. The operation will fail if s,
396 or next are not known states, or if sym is not a known symbol.
397 An exception to the latter is that sym is allowed to be the
398 empty string. In that case the new transition is an epsilon
399 transition which will not consume input when traversed. The
400 operation will also fail if the combination of (s, sym, and
401 next) is already present in the FA.
402
403 If next was not specified, then the method will return the set
404 of states which can be reached from s through a single transi‐
405 tion labeled with symbol sym.
406
407 faName !next s sym ?--> next?
408 Remove one or more transitions from the Fa in faName.
409
410 If next was specified then the single transition from the state
411 s to the state next labeled with the symbol sym is removed from
412 the FA. Otherwise all transitions originating in state s and
413 labeled with the symbol sym will be removed.
414
415 The operation will fail if s and/or next are not known as
416 states. It will also fail if a non-empty sym is not known as
417 symbol. The empty string is acceptable, and allows the removal
418 of epsilon transitions.
419
420 faName nextset stateset sym
421 Returns the set of states which can be reached by a single tran‐
422 sition originating in a state in the set stateset and labeled
423 with the symbol sym.
424
425 In other words, this is the union of [faName next s symbol] for
426 all states s in stateset.
427
428 faName is deterministic
429 A predicate. It tests whether the FA in faName is a determinis‐
430 tic FA or not. The result is a boolean value. It will be set to
431 true if the FA is deterministic, and false otherwise.
432
433 faName is complete
434 A predicate. It tests whether the FA in faName is a complete FA
435 or not. A FA is complete if it has at least one transition per
436 state and symbol. This also means that a FA without symbols, or
437 states is also complete. The result is a boolean value. It will
438 be set to true if the FA is deterministic, and false otherwise.
439
440 Note: When a FA has epsilon-transitions transitions over a sym‐
441 bol for a state S can be indirect, i.e. not attached directly to
442 S, but to a state in the epsilon-closure of S. The symbols for
443 such indirect transitions count when computing completeness.
444
445 faName is useful
446 A predicate. It tests whether the FA in faName is an useful FA
447 or not. A FA is useful if all states are reachable and useful.
448 The result is a boolean value. It will be set to true if the FA
449 is deterministic, and false otherwise.
450
451 faName is epsilon-free
452 A predicate. It tests whether the FA in faName is an epsilon-
453 free FA or not. A FA is epsilon-free if it has no epsilon tran‐
454 sitions. This definition means that all deterministic FAs are
455 epsilon-free as well, and epsilon-freeness is a necessary pre-
456 condition for deterministic'ness. The result is a boolean
457 value. It will be set to true if the FA is deterministic, and
458 false otherwise.
459
460 faName reachable_states
461 Returns the set of states which are reachable from a start state
462 by one or more transitions.
463
464 faName unreachable_states
465 Returns the set of states which are not reachable from any start
466 state by any number of transitions. This is
467
468
469 [faName states] - [faName reachable_states]
470
471
472 faName reachable s
473 A predicate. It tests whether the state s in the FA faName can
474 be reached from a start state by one or more transitions. The
475 result is a boolean value. It will be set to true if the state
476 can be reached, and false otherwise.
477
478 faName useful_states
479 Returns the set of states which are able to reach a final state
480 by one or more transitions.
481
482 faName unuseful_states
483 Returns the set of states which are not able to reach a final
484 state by any number of transitions. This is
485
486
487 [faName states] - [faName useful_states]
488
489
490 faName useful s
491 A predicate. It tests whether the state s in the FA faName is
492 able to reach a final state by one or more transitions. The
493 result is a boolean value. It will be set to true if the state
494 is useful, and false otherwise.
495
496 faName epsilon_closure s
497 Returns the set of states which are reachable from the state s
498 in the FA faName by one or more epsilon transitions, i.e transi‐
499 tions over the empty symbol, transitions which do not consume
500 input. This is called the epsilon closure of s.
501
502 faName reverse
503
504 faName complete
505
506 faName remove_eps
507
508 faName trim ?what?
509
510 faName determinize ?mapvar?
511
512 faName minimize ?mapvar?
513
514 faName complement
515
516 faName kleene
517
518 faName optional
519
520 faName union fa ?mapvar?
521
522 faName intersect fa ?mapvar?
523
524 faName difference fa ?mapvar?
525
526 faName concatenate fa ?mapvar?
527
528 faName fromRegex regex ?over?
529 These methods provide more complex operations on the FA. Please
530 see the same-named commands in the package grammar::fa::op for
531 descriptions of what they do.
532
535 For the mathematically inclined, a FA is a 5-tuple (S,Sy,St,Fi,T) where
536
537 · S is a set of states,
538
539 · Sy a set of input symbols,
540
541 · St is a subset of S, the set of start states, also known as ini‐
542 tial states.
543
544 · Fi is a subset of S, the set of final states, also known as
545 accepting.
546
547 · T is a function from S x (Sy + epsilon) to {S}, the transition
548 function. Here epsilon denotes the empty input symbol and is
549 distinct from all symbols in Sy; and {S} is the set of subsets
550 of S. In other words, T maps a combination of State and Input
551 (which can be empty) to a set of successor states.
552
553 In computer theory a FA is most often shown as a graph where the nodes
554 represent the states, and the edges between the nodes encode the tran‐
555 sition function: For all n in S' = T (s, sy) we have one edge between
556 the nodes representing s and n resp., labeled with sy. The start and
557 accepting states are encoded through distinct visual markers, i.e. they
558 are attributes of the nodes.
559
560 FA's are used to process streams of symbols over Sy.
561
562 A specific FA is said to accept a finite stream sy_1 sy_2 state in St
563 and ending at a state in Fi whose edges have the labels sy_1, sy_2,
564 etc. to sy_n. The set of all strings accepted by the FA is the lan‐
565 guage of the FA. One important equivalence is that the set of languages
566 which can be accepted by an FA is the set of regular languages.
567
568 Another important concept is that of deterministic FAs. A FA is said to
569 be deterministic if for each string of input symbols there is exactly
570 one path in the graph of the FA beginning at the start state and whose
571 edges are labeled with the symbols in the string. While it might seem
572 that non-deterministic FAs to have more power of recognition, this is
573 not so. For each non-deterministic FA we can construct a deterministic
574 FA which accepts the same language (--> Thompson's subset construc‐
575 tion).
576
577 While one of the premier applications of FAs is in parsing, especially
578 in the lexer stage (where symbols == characters), this is not the only
579 possibility by far.
580
581 Quite a lot of processes can be modeled as a FA, albeit with a possibly
582 large set of states. For these the notion of accepting states is often
583 less or not relevant at all. What is needed instead is the ability to
584 act to state changes in the FA, i.e. to generate some output in
585 response to the input. This transforms a FA into a finite transducer,
586 which has an additional set OSy of output symbols and also an addi‐
587 tional output function O which maps from "S x (Sy + epsilon)" to "(Osy
588 + epsilon)", i.e a combination of state and input, possibly empty to an
589 output symbol, or nothing.
590
591 For the graph representation this means that edges are additional
592 labeled with the output symbol to write when this edge is traversed
593 while matching input. Note that for an application "writing an output
594 symbol" can also be "executing some code".
595
596 Transducers are not handled by this package. They will get their own
597 package in the future.
598
600 automaton, finite automaton, grammar, parsing, regular expression, reg‐
601 ular grammar, regular languages, state, transducer
602
604 Copyright (c) 2004 Andreas Kupries <andreas_kupries@users.sourceforge.net>
605
606
607
608
609grammar_fa 0.2 grammar::fa(n)