1grammar::fa(n) Finite automaton operations and usage grammar::fa(n)
2
3
4
5______________________________________________________________________________
6
8 grammar::fa - Create and manipulate finite automatons
9
11 package require Tcl 8.4
12
13 package require snit 1.3
14
15 package require struct::list
16
17 package require struct::set
18
19 package require grammar::fa::op ?0.2?
20
21 package require grammar::fa ?0.4?
22
23 ::grammar::fa faName ?=|:=|<--|as|deserialize src|fromRegex re ?over??
24
25 faName option ?arg arg ...?
26
27 faName destroy
28
29 faName clear
30
31 faName = srcFA
32
33 faName --> dstFA
34
35 faName serialize
36
37 faName deserialize serialization
38
39 faName states
40
41 faName state add s1 ?s2 ...?
42
43 faName state delete s1 ?s2 ...?
44
45 faName state exists s
46
47 faName state rename s snew
48
49 faName startstates
50
51 faName start add s1 ?s2 ...?
52
53 faName start remove s1 ?s2 ...?
54
55 faName start? s
56
57 faName start?set stateset
58
59 faName finalstates
60
61 faName final add s1 ?s2 ...?
62
63 faName final remove s1 ?s2 ...?
64
65 faName final? s
66
67 faName final?set stateset
68
69 faName symbols
70
71 faName symbols@ s ?d?
72
73 faName symbols@set stateset
74
75 faName symbol add sym1 ?sym2 ...?
76
77 faName symbol delete sym1 ?sym2 ...?
78
79 faName symbol rename sym newsym
80
81 faName symbol exists sym
82
83 faName next s sym ?--> next?
84
85 faName !next s sym ?--> next?
86
87 faName nextset stateset sym
88
89 faName is deterministic
90
91 faName is complete
92
93 faName is useful
94
95 faName is epsilon-free
96
97 faName reachable_states
98
99 faName unreachable_states
100
101 faName reachable s
102
103 faName useful_states
104
105 faName unuseful_states
106
107 faName useful s
108
109 faName epsilon_closure s
110
111 faName reverse
112
113 faName complete
114
115 faName remove_eps
116
117 faName trim ?what?
118
119 faName determinize ?mapvar?
120
121 faName minimize ?mapvar?
122
123 faName complement
124
125 faName kleene
126
127 faName optional
128
129 faName union fa ?mapvar?
130
131 faName intersect fa ?mapvar?
132
133 faName difference fa ?mapvar?
134
135 faName concatenate fa ?mapvar?
136
137 faName fromRegex regex ?over?
138
139______________________________________________________________________________
140
142 This package provides a container class for finite automatons (Short:
143 FA). It allows the incremental definition of the automaton, its manip‐
144 ulation and querying of the definition. While the package provides
145 complex operations on the automaton (via package grammar::fa::op), it
146 does not have the ability to execute a definition for a stream of sym‐
147 bols. Use the packages grammar::fa::dacceptor and grammar::fa::dexec
148 for that. Another package related to this is grammar::fa::compiler. It
149 turns a FA into an executor class which has the definition of the FA
150 hardwired into it. The output of this package is configurable to suit a
151 large number of different implementation languages and paradigms.
152
153 For more information about what a finite automaton is see section
154 FINITE AUTOMATONS.
155
157 The package exports the API described here.
158
159 ::grammar::fa faName ?=|:=|<--|as|deserialize src|fromRegex re ?over??
160 Creates a new finite automaton with an associated global Tcl
161 command whose name is faName. This command may be used to invoke
162 various operations on the automaton. It has the following gen‐
163 eral form:
164
165 faName option ?arg arg ...?
166 Option and the args determine the exact behavior of the
167 command. See section FA METHODS for more explanations.
168 The new automaton will be empty if no src is specified.
169 Otherwise it will contain a copy of the definition con‐
170 tained in the src. The src has to be a FA object refer‐
171 ence for all operators except deserialize and fromRegex.
172 The deserialize operator requires src to be the serial‐
173 ization of a FA instead, and fromRegex takes a regular
174 expression in the form a of a syntax tree. See ::gram‐
175 mar::fa::op::fromRegex for more detail on that.
176
178 All automatons provide the following methods for their manipulation:
179
180 faName destroy
181 Destroys the automaton, including its storage space and associ‐
182 ated command.
183
184 faName clear
185 Clears out the definition of the automaton contained in faName,
186 but does not destroy the object.
187
188 faName = srcFA
189 Assigns the contents of the automaton contained in srcFA to
190 faName, overwriting any existing definition. This is the
191 assignment operator for automatons. It copies the automaton con‐
192 tained in the FA object srcFA over the automaton definition in
193 faName. The old contents of faName are deleted by this opera‐
194 tion.
195
196 This operation is in effect equivalent to
197
198
199
200 faName deserialize [srcFA serialize]
201
202
203 faName --> dstFA
204 This is the reverse assignment operator for automatons. It
205 copies the automation contained in the object faName over the
206 automaton definition in the object dstFA. The old contents of
207 dstFA are deleted by this operation.
208
209 This operation is in effect equivalent to
210
211
212
213 dstFA deserialize [faName serialize]
214
215
216 faName serialize
217 This method serializes the automaton stored in faName. In other
218 words it returns a tcl value completely describing that automa‐
219 ton. This allows, for example, the transfer of automatons over
220 arbitrary channels, persistence, etc. This method is also the
221 basis for both the copy constructor and the assignment operator.
222
223 The result of this method has to be semantically identical over
224 all implementations of the grammar::fa interface. This is what
225 will enable us to copy automatons between different implementa‐
226 tions of the same interface.
227
228 The result is a list of three elements with the following struc‐
229 ture:
230
231 [1] The constant string grammar::fa.
232
233 [2] A list containing the names of all known input symbols.
234 The order of elements in this list is not relevant.
235
236 [3] The last item in the list is a dictionary, however the
237 order of the keys is important as well. The keys are the
238 states of the serialized FA, and their order is the order
239 in which to create the states when deserializing. This is
240 relevant to preserve the order relationship between
241 states.
242
243 The value of each dictionary entry is a list of three
244 elements describing the state in more detail.
245
246 [1] A boolean flag. If its value is true then the
247 state is a start state, otherwise it is not.
248
249 [2] A boolean flag. If its value is true then the
250 state is a final state, otherwise it is not.
251
252 [3] The last element is a dictionary describing the
253 transitions for the state. The keys are symbols
254 (or the empty string), and the values are sets of
255 successor states.
256
257 Assuming the following FA (which describes the life of a truck driver
258 in a very simple way :)
259
260
261
262 Drive -- yellow --> Brake -- red --> (Stop) -- red/yellow --> Attention -- green --> Drive
263 (...) is the start state.
264
265
266 a possible serialization is
267
268
269
270 grammar::fa \\
271 {yellow red green red/yellow} \\
272 {Drive {0 0 {yellow Brake}} \\
273 Brake {0 0 {red Stop}} \\
274 Stop {1 0 {red/yellow Attention}} \\
275 Attention {0 0 {green Drive}}}
276
277
278 A possible one, because I did not care about creation order here
279
280 faName deserialize serialization
281 This is the complement to serialize. It replaces the automaton
282 definition in faName with the automaton described by the serial‐
283 ization value. The old contents of faName are deleted by this
284 operation.
285
286 faName states
287 Returns the set of all states known to faName.
288
289 faName state add s1 ?s2 ...?
290 Adds the states s1, s2, et cetera to the FA definition in
291 faName. The operation will fail any of the new states is already
292 declared.
293
294 faName state delete s1 ?s2 ...?
295 Deletes the state s1, s2, et cetera, and all associated informa‐
296 tion from the FA definition in faName. The latter means that the
297 information about in- or outbound transitions is deleted as
298 well. If the deleted state was a start or final state then this
299 information is invalidated as well. The operation will fail if
300 the state s is not known to the FA.
301
302 faName state exists s
303 A predicate. It tests whether the state s is known to the FA in
304 faName. The result is a boolean value. It will be set to true
305 if the state s is known, and false otherwise.
306
307 faName state rename s snew
308 Renames the state s to snew. Fails if s is not a known state.
309 Also fails if snew is already known as a state.
310
311 faName startstates
312 Returns the set of states which are marked as start states, also
313 known as initial states. See FINITE AUTOMATONS for explanations
314 what this means.
315
316 faName start add s1 ?s2 ...?
317 Mark the states s1, s2, et cetera in the FA faName as start (aka
318 initial).
319
320 faName start remove s1 ?s2 ...?
321 Mark the states s1, s2, et cetera in the FA faName as not start
322 (aka not accepting).
323
324 faName start? s
325 A predicate. It tests if the state s in the FA faName is start
326 or not. The result is a boolean value. It will be set to true
327 if the state s is start, and false otherwise.
328
329 faName start?set stateset
330 A predicate. It tests if the set of states stateset contains at
331 least one start state. They operation will fail if the set con‐
332 tains an element which is not a known state. The result is a
333 boolean value. It will be set to true if a start state is
334 present in stateset, and false otherwise.
335
336 faName finalstates
337 Returns the set of states which are marked as final states, also
338 known as accepting states. See FINITE AUTOMATONS for explana‐
339 tions what this means.
340
341 faName final add s1 ?s2 ...?
342 Mark the states s1, s2, et cetera in the FA faName as final (aka
343 accepting).
344
345 faName final remove s1 ?s2 ...?
346 Mark the states s1, s2, et cetera in the FA faName as not final
347 (aka not accepting).
348
349 faName final? s
350 A predicate. It tests if the state s in the FA faName is final
351 or not. The result is a boolean value. It will be set to true
352 if the state s is final, and false otherwise.
353
354 faName final?set stateset
355 A predicate. It tests if the set of states stateset contains at
356 least one final state. They operation will fail if the set con‐
357 tains an element which is not a known state. The result is a
358 boolean value. It will be set to true if a final state is
359 present in stateset, and false otherwise.
360
361 faName symbols
362 Returns the set of all symbols known to the FA faName.
363
364 faName symbols@ s ?d?
365 Returns the set of all symbols for which the state s has transi‐
366 tions. If the empty symbol is present then s has epsilon tran‐
367 sitions. If two states are specified the result is the set of
368 symbols which have transitions from s to t. This set may be
369 empty if there are no transitions between the two specified
370 states.
371
372 faName symbols@set stateset
373 Returns the set of all symbols for which at least one state in
374 the set of states stateset has transitions. In other words, the
375 union of [faName symbols@ s] for all states s in stateset. If
376 the empty symbol is present then at least one state contained in
377 stateset has epsilon transitions.
378
379 faName symbol add sym1 ?sym2 ...?
380 Adds the symbols sym1, sym2, et cetera to the FA definition in
381 faName. The operation will fail any of the symbols is already
382 declared. The empty string is not allowed as a value for the
383 symbols.
384
385 faName symbol delete sym1 ?sym2 ...?
386 Deletes the symbols sym1, sym2 et cetera, and all associated
387 information from the FA definition in faName. The latter means
388 that all transitions using the symbols are deleted as well. The
389 operation will fail if any of the symbols is not known to the
390 FA.
391
392 faName symbol rename sym newsym
393 Renames the symbol sym to newsym. Fails if sym is not a known
394 symbol. Also fails if newsym is already known as a symbol.
395
396 faName symbol exists sym
397 A predicate. It tests whether the symbol sym is known to the FA
398 in faName. The result is a boolean value. It will be set to
399 true if the symbol sym is known, and false otherwise.
400
401 faName next s sym ?--> next?
402 Define or query transition information.
403
404 If next is specified, then the method will add a transition from
405 the state s to the successor state next labeled with the symbol
406 sym to the FA contained in faName. The operation will fail if s,
407 or next are not known states, or if sym is not a known symbol.
408 An exception to the latter is that sym is allowed to be the
409 empty string. In that case the new transition is an epsilon
410 transition which will not consume input when traversed. The
411 operation will also fail if the combination of (s, sym, and
412 next) is already present in the FA.
413
414 If next was not specified, then the method will return the set
415 of states which can be reached from s through a single transi‐
416 tion labeled with symbol sym.
417
418 faName !next s sym ?--> next?
419 Remove one or more transitions from the Fa in faName.
420
421 If next was specified then the single transition from the state
422 s to the state next labeled with the symbol sym is removed from
423 the FA. Otherwise all transitions originating in state s and
424 labeled with the symbol sym will be removed.
425
426 The operation will fail if s and/or next are not known as
427 states. It will also fail if a non-empty sym is not known as
428 symbol. The empty string is acceptable, and allows the removal
429 of epsilon transitions.
430
431 faName nextset stateset sym
432 Returns the set of states which can be reached by a single tran‐
433 sition originating in a state in the set stateset and labeled
434 with the symbol sym.
435
436 In other words, this is the union of [faName next s symbol] for
437 all states s in stateset.
438
439 faName is deterministic
440 A predicate. It tests whether the FA in faName is a determinis‐
441 tic FA or not. The result is a boolean value. It will be set to
442 true if the FA is deterministic, and false otherwise.
443
444 faName is complete
445 A predicate. It tests whether the FA in faName is a complete FA
446 or not. A FA is complete if it has at least one transition per
447 state and symbol. This also means that a FA without symbols, or
448 states is also complete. The result is a boolean value. It will
449 be set to true if the FA is deterministic, and false otherwise.
450
451 Note: When a FA has epsilon-transitions transitions over a sym‐
452 bol for a state S can be indirect, i.e. not attached directly to
453 S, but to a state in the epsilon-closure of S. The symbols for
454 such indirect transitions count when computing completeness.
455
456 faName is useful
457 A predicate. It tests whether the FA in faName is an useful FA
458 or not. A FA is useful if all states are reachable and useful.
459 The result is a boolean value. It will be set to true if the FA
460 is deterministic, and false otherwise.
461
462 faName is epsilon-free
463 A predicate. It tests whether the FA in faName is an epsilon-
464 free FA or not. A FA is epsilon-free if it has no epsilon tran‐
465 sitions. This definition means that all deterministic FAs are
466 epsilon-free as well, and epsilon-freeness is a necessary pre-
467 condition for deterministic'ness. The result is a boolean
468 value. It will be set to true if the FA is deterministic, and
469 false otherwise.
470
471 faName reachable_states
472 Returns the set of states which are reachable from a start state
473 by one or more transitions.
474
475 faName unreachable_states
476 Returns the set of states which are not reachable from any start
477 state by any number of transitions. This is
478
479
480
481 [faName states] - [faName reachable_states]
482
483
484 faName reachable s
485 A predicate. It tests whether the state s in the FA faName can
486 be reached from a start state by one or more transitions. The
487 result is a boolean value. It will be set to true if the state
488 can be reached, and false otherwise.
489
490 faName useful_states
491 Returns the set of states which are able to reach a final state
492 by one or more transitions.
493
494 faName unuseful_states
495 Returns the set of states which are not able to reach a final
496 state by any number of transitions. This is
497
498
499
500 [faName states] - [faName useful_states]
501
502
503 faName useful s
504 A predicate. It tests whether the state s in the FA faName is
505 able to reach a final state by one or more transitions. The
506 result is a boolean value. It will be set to true if the state
507 is useful, and false otherwise.
508
509 faName epsilon_closure s
510 Returns the set of states which are reachable from the state s
511 in the FA faName by one or more epsilon transitions, i.e transi‐
512 tions over the empty symbol, transitions which do not consume
513 input. This is called the epsilon closure of s.
514
515 faName reverse
516
517 faName complete
518
519 faName remove_eps
520
521 faName trim ?what?
522
523 faName determinize ?mapvar?
524
525 faName minimize ?mapvar?
526
527 faName complement
528
529 faName kleene
530
531 faName optional
532
533 faName union fa ?mapvar?
534
535 faName intersect fa ?mapvar?
536
537 faName difference fa ?mapvar?
538
539 faName concatenate fa ?mapvar?
540
541 faName fromRegex regex ?over?
542 These methods provide more complex operations on the FA. Please
543 see the same-named commands in the package grammar::fa::op for
544 descriptions of what they do.
545
548 For the mathematically inclined, a FA is a 5-tuple (S,Sy,St,Fi,T) where
549
550 · S is a set of states,
551
552 · Sy a set of input symbols,
553
554 · St is a subset of S, the set of start states, also known as ini‐
555 tial states.
556
557 · Fi is a subset of S, the set of final states, also known as
558 accepting.
559
560 · T is a function from S x (Sy + epsilon) to {S}, the transition
561 function. Here epsilon denotes the empty input symbol and is
562 distinct from all symbols in Sy; and {S} is the set of subsets
563 of S. In other words, T maps a combination of State and Input
564 (which can be empty) to a set of successor states.
565
566 In computer theory a FA is most often shown as a graph where the nodes
567 represent the states, and the edges between the nodes encode the tran‐
568 sition function: For all n in S' = T (s, sy) we have one edge between
569 the nodes representing s and n resp., labeled with sy. The start and
570 accepting states are encoded through distinct visual markers, i.e. they
571 are attributes of the nodes.
572
573 FA's are used to process streams of symbols over Sy.
574
575 A specific FA is said to accept a finite stream sy_1 sy_2 ... sy_n if
576 there is a path in the graph of the FA beginning at a state in St and
577 ending at a state in Fi whose edges have the labels sy_1, sy_2, etc. to
578 sy_n. The set of all strings accepted by the FA is the language of the
579 FA. One important equivalence is that the set of languages which can be
580 accepted by an FA is the set of regular languages.
581
582 Another important concept is that of deterministic FAs. A FA is said to
583 be deterministic if for each string of input symbols there is exactly
584 one path in the graph of the FA beginning at the start state and whose
585 edges are labeled with the symbols in the string. While it might seem
586 that non-deterministic FAs to have more power of recognition, this is
587 not so. For each non-deterministic FA we can construct a deterministic
588 FA which accepts the same language (--> Thompson's subset construc‐
589 tion).
590
591 While one of the premier applications of FAs is in parsing, especially
592 in the lexer stage (where symbols == characters), this is not the only
593 possibility by far.
594
595 Quite a lot of processes can be modeled as a FA, albeit with a possibly
596 large set of states. For these the notion of accepting states is often
597 less or not relevant at all. What is needed instead is the ability to
598 act to state changes in the FA, i.e. to generate some output in
599 response to the input. This transforms a FA into a finite transducer,
600 which has an additional set OSy of output symbols and also an addi‐
601 tional output function O which maps from "S x (Sy + epsilon)" to "(Osy
602 + epsilon)", i.e a combination of state and input, possibly empty to an
603 output symbol, or nothing.
604
605 For the graph representation this means that edges are additional
606 labeled with the output symbol to write when this edge is traversed
607 while matching input. Note that for an application "writing an output
608 symbol" can also be "executing some code".
609
610 Transducers are not handled by this package. They will get their own
611 package in the future.
612
614 This document, and the package it describes, will undoubtedly contain
615 bugs and other problems. Please report such in the category grammar_fa
616 of the Tcllib Trackers [http://core.tcl.tk/tcllib/reportlist]. Please
617 also report any ideas for enhancements you may have for either package
618 and/or documentation.
619
620 When proposing code changes, please provide unified diffs, i.e the out‐
621 put of diff -u.
622
623 Note further that attachments are strongly preferred over inlined
624 patches. Attachments can be made by going to the Edit form of the
625 ticket immediately after its creation, and then using the left-most
626 button in the secondary navigation bar.
627
629 automaton, finite automaton, grammar, parsing, regular expression, reg‐
630 ular grammar, regular languages, state, transducer
631
633 Grammars and finite automata
634
636 Copyright (c) 2004-2009 Andreas Kupries <andreas_kupries@users.sourceforge.net>
637
638
639
640
641tcllib 0.4 grammar::fa(n)