1grammar::fa(n) Finite automaton operations and usage grammar::fa(n)
2
3
4
5______________________________________________________________________________
6
8 grammar::fa - Create and manipulate finite automatons
9
11 package require Tcl 8.4
12
13 package require snit
14
15 package require struct::list
16
17 package require struct::set
18
19 package require grammar::fa::op ?0.2?
20
21 package require grammar::fa ?0.3?
22
23 ::grammar::fa faName ?=|:=|<--|as|deserialize src|fromRegex re ?over??
24
25 faName option ?arg arg ...?
26
27 faName destroy
28
29 faName clear
30
31 faName = srcFA
32
33 faName --> dstFA
34
35 faName serialize
36
37 faName deserialize serialization
38
39 faName states
40
41 faName state add s1 ?s2 ...?
42
43 faName state delete s1 ?s2 ...?
44
45 faName state exists s
46
47 faName state rename s snew
48
49 faName startstates
50
51 faName start add s1 ?s2 ...?
52
53 faName start remove s1 ?s2 ...?
54
55 faName start? s
56
57 faName start?set stateset
58
59 faName finalstates
60
61 faName final add s1 ?s2 ...?
62
63 faName final remove s1 ?s2 ...?
64
65 faName final? s
66
67 faName final?set stateset
68
69 faName symbols
70
71 faName symbols@ s ?d?
72
73 faName symbols@set stateset
74
75 faName symbol add sym1 ?sym2 ...?
76
77 faName symbol delete sym1 ?sym2 ...?
78
79 faName symbol rename sym newsym
80
81 faName symbol exists sym
82
83 faName next s sym ?--> next?
84
85 faName !next s sym ?--> next?
86
87 faName nextset stateset sym
88
89 faName is deterministic
90
91 faName is complete
92
93 faName is useful
94
95 faName is epsilon-free
96
97 faName reachable_states
98
99 faName unreachable_states
100
101 faName reachable s
102
103 faName useful_states
104
105 faName unuseful_states
106
107 faName useful s
108
109 faName epsilon_closure s
110
111 faName reverse
112
113 faName complete
114
115 faName remove_eps
116
117 faName trim ?what?
118
119 faName determinize ?mapvar?
120
121 faName minimize ?mapvar?
122
123 faName complement
124
125 faName kleene
126
127 faName optional
128
129 faName union fa ?mapvar?
130
131 faName intersect fa ?mapvar?
132
133 faName difference fa ?mapvar?
134
135 faName concatenate fa ?mapvar?
136
137 faName fromRegex regex ?over?
138
139_________________________________________________________________
140
142 This package provides a container class for finite automatons (Short:
143 FA). It allows the incremental definition of the automaton, its manip‐
144 ulation and querying of the definition. While the package provides
145 complex operations on the automaton (via package grammar::fa::op), it
146 does not have the ability to execute a definition for a stream of sym‐
147 bols. Use the packages grammar::fa::dacceptor and grammar::fa::dexec
148 for that. Another package related to this is grammar::fa::compiler. It
149 turns a FA into an executor class which has the definition of the FA
150 hardwired into it. The output of this package is configurable to suit a
151 large number of different implementation languages and paradigms.
152
153 For more information about what a finite automaton is see section
154 FINITE AUTOMATONS.
155
157 The package exports the API described here.
158
159 ::grammar::fa faName ?=|:=|<--|as|deserialize src|fromRegex re ?over??
160 Creates a new finite automaton with an associated global Tcl
161 command whose name is faName. This command may be used to invoke
162 various operations on the automaton. It has the following gen‐
163 eral form:
164
165 faName option ?arg arg ...?
166 Option and the args determine the exact behavior of the
167 command. See section FA METHODS for more explanations.
168 The new automaton will be empty if no src is specified.
169 Otherwise it will contain a copy of the definition con‐
170 tained in the src. The src has to be a FA object refer‐
171 ence for all operators except deserialize and fromRegex.
172 The deserialize operator requires src to be the serial‐
173 ization of a FA instead, and fromRegex takes a regular
174 expression in the form a of a syntax tree. See ::gram‐
175 mar::fa::op::fromRegex for more detail on that.
176
178 All automatons provide the following methods for their manipulation:
179
180 faName destroy
181 Destroys the automaton, including its storage space and associ‐
182 ated command.
183
184 faName clear
185 Clears out the definition of the automaton contained in faName,
186 but does not destroy the object.
187
188 faName = srcFA
189 Assigns the contents of the automaton contained in srcFA to
190 faName, overwriting any existing definition. This is the
191 assignment operator for automatons. It copies the automaton con‐
192 tained in the FA object srcFA over the automaton definition in
193 faName. The old contents of faName are deleted by this opera‐
194 tion.
195
196 This operation is in effect equivalent to
197
198
199 faName deserialize [srcFA serialize]
200
201
202 faName --> dstFA
203 This is the reverse assignment operator for automatons. It
204 copies the automation contained in the object faName over the
205 automaton definition in the object dstFA. The old contents of
206 dstFA are deleted by this operation.
207
208 This operation is in effect equivalent to
209
210
211 dstFA deserialize [faName serialize]
212
213
214 faName serialize
215 This method serializes the automaton stored in faName. In other
216 words it returns a tcl value completely describing that automa‐
217 ton. This allows, for example, the transfer of automatons over
218 arbitrary channels, persistence, etc. This method is also the
219 basis for both the copy constructor and the assignment operator.
220
221 The result of this method has to be semantically identical over
222 all implementations of the grammar::fa interface. This is what
223 will enable us to copy automatons between different implementa‐
224 tions of the same interface.
225
226 The result is a list of three elements with the following struc‐
227 ture:
228
229 [1] The constant string grammar::fa.
230
231 [2] A list containing the names of all known input symbols.
232 The order of elements in this list is not relevant.
233
234 [3] The last item in the list is a dictionary, however the
235 order of the keys is important as well. The keys are the
236 states of the serialized FA, and their order is the order
237 in which to create the states when deserializing. This is
238 relevant to preserve the order relationship between
239 states.
240
241 The value of each dictionary entry is a list of three
242 elements describing the state in more detail.
243
244 [1] A boolean flag. If its value is true then the
245 state is a start state, otherwise it is not.
246
247 [2] A boolean flag. If its value is true then the
248 state is a final state, otherwise it is not.
249
250 [3] The last element is a dictionary describing the
251 transitions for the state. The keys are symbols
252 (or the empty string), and the values are sets of
253 successor states.
254
255 Assuming the following FA (which describes the life of a truck driver
256 in a very simple way :)
257
258
259 Drive -- yellow --> Brake -- red --> (Stop) -- red/yellow --> Attention -- green --> Drive
260 (...) is the start state.
261
262
263 a possible serialization is
264
265
266 grammar::fa \\
267 {yellow red green red/yellow} \\
268 {Drive {0 0 {yellow Brake}} \\
269 Brake {0 0 {red Stop}} \\
270 Stop {1 0 {red/yellow Attention}} \\
271 Attention {0 0 {green Drive}}}
272
273
274 A possible one, because I did not care about creation order here
275
276 faName deserialize serialization
277 This is the complement to serialize. It replaces the automaton
278 definition in faName with the automaton described by the serial‐
279 ization value. The old contents of faName are deleted by this
280 operation.
281
282 faName states
283 Returns the set of all states known to faName.
284
285 faName state add s1 ?s2 ...?
286 Adds the states s1, s2, et cetera to the FA definition in
287 faName. The operation will fail any of the new states is already
288 declared.
289
290 faName state delete s1 ?s2 ...?
291 Deletes the state s1, s2, et cetera, and all associated informa‐
292 tion from the FA definition in faName. The latter means that the
293 information about in- or outbound transitions is deleted as
294 well. If the deleted state was a start or final state then this
295 information is invalidated as well. The operation will fail if
296 the state s is not known to the FA.
297
298 faName state exists s
299 A predicate. It tests whether the state s is known to the FA in
300 faName. The result is a boolean value. It will be set to true
301 if the state s is known, and false otherwise.
302
303 faName state rename s snew
304 Renames the state s to snew. Fails if s is not a known state.
305 Also fails if snew is already known as a state.
306
307 faName startstates
308 Returns the set of states which are marked as start states, also
309 known as initial states. See FINITE AUTOMATONS for explanations
310 what this means.
311
312 faName start add s1 ?s2 ...?
313 Mark the states s1, s2, et cetera in the FA faName as start (aka
314 initial).
315
316 faName start remove s1 ?s2 ...?
317 Mark the states s1, s2, et cetera in the FA faName as not start
318 (aka not accepting).
319
320 faName start? s
321 A predicate. It tests if the state s in the FA faName is start
322 or not. The result is a boolean value. It will be set to true
323 if the state s is start, and false otherwise.
324
325 faName start?set stateset
326 A predicate. It tests if the set of states stateset contains at
327 least one start state. They operation will fail if the set con‐
328 tains an element which is not a known state. The result is a
329 boolean value. It will be set to true if a start state is
330 present in stateset, and false otherwise.
331
332 faName finalstates
333 Returns the set of states which are marked as final states, also
334 known as accepting states. See FINITE AUTOMATONS for explana‐
335 tions what this means.
336
337 faName final add s1 ?s2 ...?
338 Mark the states s1, s2, et cetera in the FA faName as final (aka
339 accepting).
340
341 faName final remove s1 ?s2 ...?
342 Mark the states s1, s2, et cetera in the FA faName as not final
343 (aka not accepting).
344
345 faName final? s
346 A predicate. It tests if the state s in the FA faName is final
347 or not. The result is a boolean value. It will be set to true
348 if the state s is final, and false otherwise.
349
350 faName final?set stateset
351 A predicate. It tests if the set of states stateset contains at
352 least one final state. They operation will fail if the set con‐
353 tains an element which is not a known state. The result is a
354 boolean value. It will be set to true if a final state is
355 present in stateset, and false otherwise.
356
357 faName symbols
358 Returns the set of all symbols known to the FA faName.
359
360 faName symbols@ s ?d?
361 Returns the set of all symbols for which the state s has transi‐
362 tions. If the empty symbol is present then s has epsilon tran‐
363 sitions. If two states are specified the result is the set of
364 symbols which have transitions from s to t. This set may be
365 empty if there are no transitions between the two specified
366 states.
367
368 faName symbols@set stateset
369 Returns the set of all symbols for which at least one state in
370 the set of states stateset has transitions. In other words, the
371 union of [faName symbols@ s] for all states s in stateset. If
372 the empty symbol is present then at least one state contained in
373 stateset has epsilon transitions.
374
375 faName symbol add sym1 ?sym2 ...?
376 Adds the symbols sym1, sym2, et cetera to the FA definition in
377 faName. The operation will fail any of the symbols is already
378 declared. The empty string is not allowed as a value for the
379 symbols.
380
381 faName symbol delete sym1 ?sym2 ...?
382 Deletes the symbols sym1, sym2 et cetera, and all associated
383 information from the FA definition in faName. The latter means
384 that all transitions using the symbols are deleted as well. The
385 operation will fail if any of the symbols is not known to the
386 FA.
387
388 faName symbol rename sym newsym
389 Renames the symbol sym to newsym. Fails if sym is not a known
390 symbol. Also fails if newsym is already known as a symbol.
391
392 faName symbol exists sym
393 A predicate. It tests whether the symbol sym is known to the FA
394 in faName. The result is a boolean value. It will be set to
395 true if the symbol sym is known, and false otherwise.
396
397 faName next s sym ?--> next?
398 Define or query transition information.
399
400 If next is specified, then the method will add a transition from
401 the state s to the successor state next labeled with the symbol
402 sym to the FA contained in faName. The operation will fail if s,
403 or next are not known states, or if sym is not a known symbol.
404 An exception to the latter is that sym is allowed to be the
405 empty string. In that case the new transition is an epsilon
406 transition which will not consume input when traversed. The
407 operation will also fail if the combination of (s, sym, and
408 next) is already present in the FA.
409
410 If next was not specified, then the method will return the set
411 of states which can be reached from s through a single transi‐
412 tion labeled with symbol sym.
413
414 faName !next s sym ?--> next?
415 Remove one or more transitions from the Fa in faName.
416
417 If next was specified then the single transition from the state
418 s to the state next labeled with the symbol sym is removed from
419 the FA. Otherwise all transitions originating in state s and
420 labeled with the symbol sym will be removed.
421
422 The operation will fail if s and/or next are not known as
423 states. It will also fail if a non-empty sym is not known as
424 symbol. The empty string is acceptable, and allows the removal
425 of epsilon transitions.
426
427 faName nextset stateset sym
428 Returns the set of states which can be reached by a single tran‐
429 sition originating in a state in the set stateset and labeled
430 with the symbol sym.
431
432 In other words, this is the union of [faName next s symbol] for
433 all states s in stateset.
434
435 faName is deterministic
436 A predicate. It tests whether the FA in faName is a determinis‐
437 tic FA or not. The result is a boolean value. It will be set to
438 true if the FA is deterministic, and false otherwise.
439
440 faName is complete
441 A predicate. It tests whether the FA in faName is a complete FA
442 or not. A FA is complete if it has at least one transition per
443 state and symbol. This also means that a FA without symbols, or
444 states is also complete. The result is a boolean value. It will
445 be set to true if the FA is deterministic, and false otherwise.
446
447 Note: When a FA has epsilon-transitions transitions over a sym‐
448 bol for a state S can be indirect, i.e. not attached directly to
449 S, but to a state in the epsilon-closure of S. The symbols for
450 such indirect transitions count when computing completeness.
451
452 faName is useful
453 A predicate. It tests whether the FA in faName is an useful FA
454 or not. A FA is useful if all states are reachable and useful.
455 The result is a boolean value. It will be set to true if the FA
456 is deterministic, and false otherwise.
457
458 faName is epsilon-free
459 A predicate. It tests whether the FA in faName is an epsilon-
460 free FA or not. A FA is epsilon-free if it has no epsilon tran‐
461 sitions. This definition means that all deterministic FAs are
462 epsilon-free as well, and epsilon-freeness is a necessary pre-
463 condition for deterministic'ness. The result is a boolean
464 value. It will be set to true if the FA is deterministic, and
465 false otherwise.
466
467 faName reachable_states
468 Returns the set of states which are reachable from a start state
469 by one or more transitions.
470
471 faName unreachable_states
472 Returns the set of states which are not reachable from any start
473 state by any number of transitions. This is
474
475
476 [faName states] - [faName reachable_states]
477
478
479 faName reachable s
480 A predicate. It tests whether the state s in the FA faName can
481 be reached from a start state by one or more transitions. The
482 result is a boolean value. It will be set to true if the state
483 can be reached, and false otherwise.
484
485 faName useful_states
486 Returns the set of states which are able to reach a final state
487 by one or more transitions.
488
489 faName unuseful_states
490 Returns the set of states which are not able to reach a final
491 state by any number of transitions. This is
492
493
494 [faName states] - [faName useful_states]
495
496
497 faName useful s
498 A predicate. It tests whether the state s in the FA faName is
499 able to reach a final state by one or more transitions. The
500 result is a boolean value. It will be set to true if the state
501 is useful, and false otherwise.
502
503 faName epsilon_closure s
504 Returns the set of states which are reachable from the state s
505 in the FA faName by one or more epsilon transitions, i.e transi‐
506 tions over the empty symbol, transitions which do not consume
507 input. This is called the epsilon closure of s.
508
509 faName reverse
510
511 faName complete
512
513 faName remove_eps
514
515 faName trim ?what?
516
517 faName determinize ?mapvar?
518
519 faName minimize ?mapvar?
520
521 faName complement
522
523 faName kleene
524
525 faName optional
526
527 faName union fa ?mapvar?
528
529 faName intersect fa ?mapvar?
530
531 faName difference fa ?mapvar?
532
533 faName concatenate fa ?mapvar?
534
535 faName fromRegex regex ?over?
536 These methods provide more complex operations on the FA. Please
537 see the same-named commands in the package grammar::fa::op for
538 descriptions of what they do.
539
542 For the mathematically inclined, a FA is a 5-tuple (S,Sy,St,Fi,T) where
543
544 · S is a set of states,
545
546 · Sy a set of input symbols,
547
548 · St is a subset of S, the set of start states, also known as ini‐
549 tial states.
550
551 · Fi is a subset of S, the set of final states, also known as
552 accepting.
553
554 · T is a function from S x (Sy + epsilon) to {S}, the transition
555 function. Here epsilon denotes the empty input symbol and is
556 distinct from all symbols in Sy; and {S} is the set of subsets
557 of S. In other words, T maps a combination of State and Input
558 (which can be empty) to a set of successor states.
559
560 In computer theory a FA is most often shown as a graph where the nodes
561 represent the states, and the edges between the nodes encode the tran‐
562 sition function: For all n in S' = T (s, sy) we have one edge between
563 the nodes representing s and n resp., labeled with sy. The start and
564 accepting states are encoded through distinct visual markers, i.e. they
565 are attributes of the nodes.
566
567 FA's are used to process streams of symbols over Sy.
568
569 A specific FA is said to accept a finite stream sy_1 sy_2 state in St
570 and ending at a state in Fi whose edges have the labels sy_1, sy_2,
571 etc. to sy_n. The set of all strings accepted by the FA is the lan‐
572 guage of the FA. One important equivalence is that the set of languages
573 which can be accepted by an FA is the set of regular languages.
574
575 Another important concept is that of deterministic FAs. A FA is said to
576 be deterministic if for each string of input symbols there is exactly
577 one path in the graph of the FA beginning at the start state and whose
578 edges are labeled with the symbols in the string. While it might seem
579 that non-deterministic FAs to have more power of recognition, this is
580 not so. For each non-deterministic FA we can construct a deterministic
581 FA which accepts the same language (--> Thompson's subset construc‐
582 tion).
583
584 While one of the premier applications of FAs is in parsing, especially
585 in the lexer stage (where symbols == characters), this is not the only
586 possibility by far.
587
588 Quite a lot of processes can be modeled as a FA, albeit with a possibly
589 large set of states. For these the notion of accepting states is often
590 less or not relevant at all. What is needed instead is the ability to
591 act to state changes in the FA, i.e. to generate some output in
592 response to the input. This transforms a FA into a finite transducer,
593 which has an additional set OSy of output symbols and also an addi‐
594 tional output function O which maps from "S x (Sy + epsilon)" to "(Osy
595 + epsilon)", i.e a combination of state and input, possibly empty to an
596 output symbol, or nothing.
597
598 For the graph representation this means that edges are additional
599 labeled with the output symbol to write when this edge is traversed
600 while matching input. Note that for an application "writing an output
601 symbol" can also be "executing some code".
602
603 Transducers are not handled by this package. They will get their own
604 package in the future.
605
607 This document, and the package it describes, will undoubtedly contain
608 bugs and other problems. Please report such in the category grammar_fa
609 of the Tcllib SF Trackers [http://source‐
610 forge.net/tracker/?group_id=12883]. Please also report any ideas for
611 enhancements you may have for either package and/or documentation.
612
614 automaton, finite automaton, grammar, parsing, regular expression, reg‐
615 ular grammar, regular languages, state, transducer
616
618 Copyright (c) 2004-2007 Andreas Kupries <andreas_kupries@users.sourceforge.net>
619
620
621
622
623grammar_fa 0.3 grammar::fa(n)