fa(n) - f14

1grammar::fa(n)       Finite automaton operations and usage      grammar::fa(n)
2
3
4
5______________________________________________________________________________
6

NAME

8       grammar::fa - Create and manipulate finite automatons
9

SYNOPSIS

11       package require Tcl  8.4
12
13       package require snit
14
15       package require struct::list
16
17       package require struct::set
18
19       package require grammar::fa::op  ?0.2?
20
21       package require grammar::fa  ?0.3?
22
23       ::grammar::fa faName ?=|:=|<--|as|deserialize src|fromRegex re ?over??
24
25       faName option ?arg arg ...?
26
27       faName destroy
28
29       faName clear
30
31       faName = srcFA
32
33       faName --> dstFA
34
35       faName serialize
36
37       faName deserialize serialization
38
39       faName states
40
41       faName state add s1 ?s2 ...?
42
43       faName state delete s1 ?s2 ...?
44
45       faName state exists s
46
47       faName state rename s snew
48
49       faName startstates
50
51       faName start add s1 ?s2 ...?
52
53       faName start remove s1 ?s2 ...?
54
55       faName start? s
56
57       faName start?set stateset
58
59       faName finalstates
60
61       faName final add s1 ?s2 ...?
62
63       faName final remove s1 ?s2 ...?
64
65       faName final? s
66
67       faName final?set stateset
68
69       faName symbols
70
71       faName symbols@ s ?d?
72
73       faName symbols@set stateset
74
75       faName symbol add sym1 ?sym2 ...?
76
77       faName symbol delete sym1 ?sym2 ...?
78
79       faName symbol rename sym newsym
80
81       faName symbol exists sym
82
83       faName next s sym ?--> next?
84
85       faName !next s sym ?--> next?
86
87       faName nextset stateset sym
88
89       faName is deterministic
90
91       faName is complete
92
93       faName is useful
94
95       faName is epsilon-free
96
97       faName reachable_states
98
99       faName unreachable_states
100
101       faName reachable s
102
103       faName useful_states
104
105       faName unuseful_states
106
107       faName useful s
108
109       faName epsilon_closure s
110
111       faName reverse
112
113       faName complete
114
115       faName remove_eps
116
117       faName trim ?what?
118
119       faName determinize ?mapvar?
120
121       faName minimize ?mapvar?
122
123       faName complement
124
125       faName kleene
126
127       faName optional
128
129       faName union fa ?mapvar?
130
131       faName intersect fa ?mapvar?
132
133       faName difference fa ?mapvar?
134
135       faName concatenate fa ?mapvar?
136
137       faName fromRegex regex ?over?
138
139_________________________________________________________________
140

DESCRIPTION

142       This  package  provides a container class for finite automatons (Short:
143       FA).  It allows the incremental definition of the automaton, its manip‐
144       ulation  and  querying  of  the definition.  While the package provides
145       complex operations on the automaton (via package  grammar::fa::op),  it
146       does  not have the ability to execute a definition for a stream of sym‐
147       bols.  Use the packages grammar::fa::dacceptor  and  grammar::fa::dexec
148       for that.  Another package related to this is grammar::fa::compiler. It
149       turns a FA into an executor class which has the definition  of  the  FA
150       hardwired into it. The output of this package is configurable to suit a
151       large number of different implementation languages and paradigms.
152
153       For more information about what  a  finite  automaton  is  see  section
154       FINITE AUTOMATONS.
155

API

157       The package exports the API described here.
158
159       ::grammar::fa faName ?=|:=|<--|as|deserialize src|fromRegex re ?over??
160              Creates  a  new  finite  automaton with an associated global Tcl
161              command whose name is faName. This command may be used to invoke
162              various  operations  on the automaton. It has the following gen‐
163              eral form:
164
165              faName option ?arg arg ...?
166                     Option and the args determine the exact behavior  of  the
167                     command.  See  section  FA METHODS for more explanations.
168                     The new automaton will be empty if no src  is  specified.
169                     Otherwise  it  will contain a copy of the definition con‐
170                     tained in the src.  The src has to be a FA object  refer‐
171                     ence  for all operators except deserialize and fromRegex.
172                     The deserialize operator requires src to be  the  serial‐
173                     ization  of  a  FA instead, and fromRegex takes a regular
174                     expression in the form a of a syntax  tree.  See  ::gram‐
175                     mar::fa::op::fromRegex for more detail on that.
176

FA METHODS

178       All automatons provide the following methods for their manipulation:
179
180       faName destroy
181              Destroys  the automaton, including its storage space and associ‐
182              ated command.
183
184       faName clear
185              Clears out the definition of the automaton contained in  faName,
186              but does not destroy the object.
187
188       faName = srcFA
189              Assigns  the  contents  of  the  automaton contained in srcFA to
190              faName,  overwriting  any  existing  definition.   This  is  the
191              assignment operator for automatons. It copies the automaton con‐
192              tained in the FA object srcFA over the automaton  definition  in
193              faName.  The  old  contents of faName are deleted by this opera‐
194              tion.
195
196              This operation is in effect equivalent to
197
198
199                  faName deserialize [srcFA serialize]
200
201
202       faName --> dstFA
203              This is the  reverse  assignment  operator  for  automatons.  It
204              copies  the  automation  contained in the object faName over the
205              automaton definition in the object dstFA.  The old  contents  of
206              dstFA are deleted by this operation.
207
208              This operation is in effect equivalent to
209
210
211                  dstFA deserialize [faName serialize]
212
213
214       faName serialize
215              This  method serializes the automaton stored in faName. In other
216              words it returns a tcl value completely describing that  automa‐
217              ton.   This allows, for example, the transfer of automatons over
218              arbitrary channels, persistence, etc.  This method is  also  the
219              basis for both the copy constructor and the assignment operator.
220
221              The  result of this method has to be semantically identical over
222              all implementations of the grammar::fa interface. This  is  what
223              will  enable us to copy automatons between different implementa‐
224              tions of the same interface.
225
226              The result is a list of three elements with the following struc‐
227              ture:
228
229              [1]    The constant string grammar::fa.
230
231              [2]    A  list  containing the names of all known input symbols.
232                     The order of elements in this list is not relevant.
233
234              [3]    The last item in the list is a  dictionary,  however  the
235                     order  of the keys is important as well. The keys are the
236                     states of the serialized FA, and their order is the order
237                     in which to create the states when deserializing. This is
238                     relevant  to  preserve  the  order  relationship  between
239                     states.
240
241                     The  value  of  each  dictionary entry is a list of three
242                     elements describing the state in more detail.
243
244                     [1]    A boolean flag. If its  value  is  true  then  the
245                            state is a start state, otherwise it is not.
246
247                     [2]    A  boolean  flag.  If  its  value is true then the
248                            state is a final state, otherwise it is not.
249
250                     [3]    The last element is a  dictionary  describing  the
251                            transitions  for  the  state. The keys are symbols
252                            (or the empty string), and the values are sets  of
253                            successor states.
254
255       Assuming  the  following FA (which describes the life of a truck driver
256       in a very simple way :)
257
258
259           Drive -- yellow --> Brake -- red --> (Stop) -- red/yellow --> Attention -- green --> Drive
260           (...) is the start state.
261
262
263       a possible serialization is
264
265
266           grammar::fa \\
267           {yellow red green red/yellow} \\
268           {Drive     {0 0 {yellow     Brake}} \\
269            Brake     {0 0 {red        Stop}} \\
270            Stop      {1 0 {red/yellow Attention}} \\
271            Attention {0 0 {green      Drive}}}
272
273
274       A possible one, because I did not care about creation order here
275
276       faName deserialize serialization
277              This is the complement to serialize. It replaces  the  automaton
278              definition in faName with the automaton described by the serial‐
279              ization value. The old contents of faName are  deleted  by  this
280              operation.
281
282       faName states
283              Returns the set of all states known to faName.
284
285       faName state add s1 ?s2 ...?
286              Adds  the  states  s1,  s2,  et  cetera  to the FA definition in
287              faName. The operation will fail any of the new states is already
288              declared.
289
290       faName state delete s1 ?s2 ...?
291              Deletes the state s1, s2, et cetera, and all associated informa‐
292              tion from the FA definition in faName. The latter means that the
293              information  about  in-  or  outbound  transitions is deleted as
294              well. If the deleted state was a start or final state then  this
295              information  is  invalidated as well. The operation will fail if
296              the state s is not known to the FA.
297
298       faName state exists s
299              A predicate. It tests whether the state s is known to the FA  in
300              faName.   The  result is a boolean value. It will be set to true
301              if the state s is known, and false otherwise.
302
303       faName state rename s snew
304              Renames the state s to snew. Fails if s is not  a  known  state.
305              Also fails if snew is already known as a state.
306
307       faName startstates
308              Returns the set of states which are marked as start states, also
309              known as initial states.  See FINITE AUTOMATONS for explanations
310              what this means.
311
312       faName start add s1 ?s2 ...?
313              Mark the states s1, s2, et cetera in the FA faName as start (aka
314              initial).
315
316       faName start remove s1 ?s2 ...?
317              Mark the states s1, s2, et cetera in the FA faName as not  start
318              (aka not accepting).
319
320       faName start? s
321              A  predicate.  It tests if the state s in the FA faName is start
322              or not.  The result is a boolean value. It will be set  to  true
323              if the state s is start, and false otherwise.
324
325       faName start?set stateset
326              A  predicate. It tests if the set of states stateset contains at
327              least one start state. They operation will fail if the set  con‐
328              tains  an  element  which is not a known state.  The result is a
329              boolean value. It will be set  to  true  if  a  start  state  is
330              present in stateset, and false otherwise.
331
332       faName finalstates
333              Returns the set of states which are marked as final states, also
334              known as accepting states.  See FINITE AUTOMATONS  for  explana‐
335              tions what this means.
336
337       faName final add s1 ?s2 ...?
338              Mark the states s1, s2, et cetera in the FA faName as final (aka
339              accepting).
340
341       faName final remove s1 ?s2 ...?
342              Mark the states s1, s2, et cetera in the FA faName as not  final
343              (aka not accepting).
344
345       faName final? s
346              A  predicate.  It tests if the state s in the FA faName is final
347              or not.  The result is a boolean value. It will be set  to  true
348              if the state s is final, and false otherwise.
349
350       faName final?set stateset
351              A  predicate. It tests if the set of states stateset contains at
352              least one final state. They operation will fail if the set  con‐
353              tains  an  element  which is not a known state.  The result is a
354              boolean value. It will be set  to  true  if  a  final  state  is
355              present in stateset, and false otherwise.
356
357       faName symbols
358              Returns the set of all symbols known to the FA faName.
359
360       faName symbols@ s ?d?
361              Returns the set of all symbols for which the state s has transi‐
362              tions.  If the empty symbol is present then s has epsilon  tran‐
363              sitions.  If  two  states are specified the result is the set of
364              symbols which have transitions from s to  t.  This  set  may  be
365              empty  if  there  are  no  transitions between the two specified
366              states.
367
368       faName symbols@set stateset
369              Returns the set of all symbols for which at least one  state  in
370              the set of states stateset has transitions.  In other words, the
371              union of [faName symbols@ s] for all states s in  stateset.   If
372              the empty symbol is present then at least one state contained in
373              stateset has epsilon transitions.
374
375       faName symbol add sym1 ?sym2 ...?
376              Adds the symbols sym1, sym2, et cetera to the FA  definition  in
377              faName.  The  operation  will fail any of the symbols is already
378              declared. The empty string is not allowed as  a  value  for  the
379              symbols.
380
381       faName symbol delete sym1 ?sym2 ...?
382              Deletes  the  symbols  sym1,  sym2 et cetera, and all associated
383              information from the FA definition in faName. The  latter  means
384              that  all transitions using the symbols are deleted as well. The
385              operation will fail if any of the symbols is not  known  to  the
386              FA.
387
388       faName symbol rename sym newsym
389              Renames  the  symbol  sym to newsym. Fails if sym is not a known
390              symbol. Also fails if newsym is already known as a symbol.
391
392       faName symbol exists sym
393              A predicate. It tests whether the symbol sym is known to the  FA
394              in  faName.   The  result  is a boolean value. It will be set to
395              true if the symbol sym is known, and false otherwise.
396
397       faName next s sym ?--> next?
398              Define or query transition information.
399
400              If next is specified, then the method will add a transition from
401              the  state s to the successor state next labeled with the symbol
402              sym to the FA contained in faName. The operation will fail if s,
403              or  next  are not known states, or if sym is not a known symbol.
404              An exception to the latter is that sym  is  allowed  to  be  the
405              empty  string.  In  that  case  the new transition is an epsilon
406              transition which will not  consume  input  when  traversed.  The
407              operation  will  also  fail  if  the combination of (s, sym, and
408              next) is already present in the FA.
409
410              If next was not specified, then the method will return  the  set
411              of  states  which can be reached from s through a single transi‐
412              tion labeled with symbol sym.
413
414       faName !next s sym ?--> next?
415              Remove one or more transitions from the Fa in faName.
416
417              If next was specified then the single transition from the  state
418              s  to the state next labeled with the symbol sym is removed from
419              the FA. Otherwise all transitions originating  in  state  s  and
420              labeled with the symbol sym will be removed.
421
422              The  operation  will  fail  if  s  and/or  next are not known as
423              states. It will also fail if a non-empty sym  is  not  known  as
424              symbol.  The  empty string is acceptable, and allows the removal
425              of epsilon transitions.
426
427       faName nextset stateset sym
428              Returns the set of states which can be reached by a single tran‐
429              sition  originating  in  a state in the set stateset and labeled
430              with the symbol sym.
431
432              In other words, this is the union of [faName next s symbol]  for
433              all states s in stateset.
434
435       faName is deterministic
436              A  predicate. It tests whether the FA in faName is a determinis‐
437              tic FA or not.  The result is a boolean value. It will be set to
438              true if the FA is deterministic, and false otherwise.
439
440       faName is complete
441              A  predicate. It tests whether the FA in faName is a complete FA
442              or not. A FA is complete if it has at least one  transition  per
443              state  and symbol. This also means that a FA without symbols, or
444              states is also complete.  The result is a boolean value. It will
445              be set to true if the FA is deterministic, and false otherwise.
446
447              Note:  When a FA has epsilon-transitions transitions over a sym‐
448              bol for a state S can be indirect, i.e. not attached directly to
449              S,  but  to a state in the epsilon-closure of S. The symbols for
450              such indirect transitions count when computing completeness.
451
452       faName is useful
453              A predicate. It tests whether the FA in faName is an  useful  FA
454              or  not.  A FA is useful if all states are reachable and useful.
455              The result is a boolean value. It will be set to true if the  FA
456              is deterministic, and false otherwise.
457
458       faName is epsilon-free
459              A  predicate.  It  tests whether the FA in faName is an epsilon-
460              free FA or not. A FA is epsilon-free if it has no epsilon  tran‐
461              sitions.  This  definition  means that all deterministic FAs are
462              epsilon-free as well, and epsilon-freeness is a  necessary  pre-
463              condition  for  deterministic'ness.   The  result  is  a boolean
464              value. It will be set to true if the FA  is  deterministic,  and
465              false otherwise.
466
467       faName reachable_states
468              Returns the set of states which are reachable from a start state
469              by one or more transitions.
470
471       faName unreachable_states
472              Returns the set of states which are not reachable from any start
473              state by any number of transitions. This is
474
475
476                    [faName states] - [faName reachable_states]
477
478
479       faName reachable s
480              A  predicate.  It tests whether the state s in the FA faName can
481              be reached from a start state by one or more  transitions.   The
482              result  is  a boolean value. It will be set to true if the state
483              can be reached, and false otherwise.
484
485       faName useful_states
486              Returns the set of states which are able to reach a final  state
487              by one or more transitions.
488
489       faName unuseful_states
490              Returns  the  set  of states which are not able to reach a final
491              state by any number of transitions. This is
492
493
494                    [faName states] - [faName useful_states]
495
496
497       faName useful s
498              A predicate. It tests whether the state s in the  FA  faName  is
499              able  to  reach  a  final state by one or more transitions.  The
500              result is a boolean value. It will be set to true if  the  state
501              is useful, and false otherwise.
502
503       faName epsilon_closure s
504              Returns  the  set of states which are reachable from the state s
505              in the FA faName by one or more epsilon transitions, i.e transi‐
506              tions  over  the  empty symbol, transitions which do not consume
507              input. This is called the epsilon closure of s.
508
509       faName reverse
510
511       faName complete
512
513       faName remove_eps
514
515       faName trim ?what?
516
517       faName determinize ?mapvar?
518
519       faName minimize ?mapvar?
520
521       faName complement
522
523       faName kleene
524
525       faName optional
526
527       faName union fa ?mapvar?
528
529       faName intersect fa ?mapvar?
530
531       faName difference fa ?mapvar?
532
533       faName concatenate fa ?mapvar?
534
535       faName fromRegex regex ?over?
536              These methods provide more complex operations on the FA.  Please
537              see  the  same-named commands in the package grammar::fa::op for
538              descriptions of what they do.
539

EXAMPLES

FINITE AUTOMATONS

542       For the mathematically inclined, a FA is a 5-tuple (S,Sy,St,Fi,T) where
543
544       ·      S is a set of states,
545
546       ·      Sy a set of input symbols,
547
548       ·      St is a subset of S, the set of start states, also known as ini‐
549              tial states.
550
551       ·      Fi  is  a  subset  of  S, the set of final states, also known as
552              accepting.
553
554       ·      T is a function from S x (Sy + epsilon) to {S},  the  transition
555              function.   Here  epsilon  denotes the empty input symbol and is
556              distinct from all symbols in Sy; and {S} is the set  of  subsets
557              of  S.  In  other words, T maps a combination of State and Input
558              (which can be empty) to a set of successor states.
559
560       In computer theory a FA is most often shown as a graph where the  nodes
561       represent  the states, and the edges between the nodes encode the tran‐
562       sition function: For all n in S' = T (s, sy) we have one  edge  between
563       the  nodes  representing  s and n resp., labeled with sy. The start and
564       accepting states are encoded through distinct visual markers, i.e. they
565       are attributes of the nodes.
566
567       FA's are used to process streams of symbols over Sy.
568
569       A  specific  FA is said to accept a finite stream sy_1 sy_2 state in St
570       and ending at a state in Fi whose edges have  the  labels  sy_1,  sy_2,
571       etc.  to  sy_n.   The set of all strings accepted by the FA is the lan‐
572       guage of the FA. One important equivalence is that the set of languages
573       which can be accepted by an FA is the set of regular languages.
574
575       Another important concept is that of deterministic FAs. A FA is said to
576       be deterministic if for each string of input symbols there  is  exactly
577       one  path in the graph of the FA beginning at the start state and whose
578       edges are labeled with the symbols in the string.  While it might  seem
579       that  non-deterministic  FAs to have more power of recognition, this is
580       not so. For each non-deterministic FA we can construct a  deterministic
581       FA  which  accepts  the  same language (--> Thompson's subset construc‐
582       tion).
583
584       While one of the premier applications of FAs is in parsing,  especially
585       in  the lexer stage (where symbols == characters), this is not the only
586       possibility by far.
587
588       Quite a lot of processes can be modeled as a FA, albeit with a possibly
589       large  set of states. For these the notion of accepting states is often
590       less or not relevant at all. What is needed instead is the  ability  to
591       act  to  state  changes  in  the  FA,  i.e.  to generate some output in
592       response to the input.  This transforms a FA into a finite  transducer,
593       which  has  an  additional  set OSy of output symbols and also an addi‐
594       tional output function O which maps from "S x (Sy + epsilon)" to  "(Osy
595       + epsilon)", i.e a combination of state and input, possibly empty to an
596       output symbol, or nothing.
597
598       For the graph representation  this  means  that  edges  are  additional
599       labeled  with  the  output  symbol to write when this edge is traversed
600       while matching input. Note that for an application "writing  an  output
601       symbol" can also be "executing some code".
602
603       Transducers  are  not  handled by this package. They will get their own
604       package in the future.
605

BUGS, IDEAS, FEEDBACK

607       This document, and the package it describes, will  undoubtedly  contain
608       bugs and other problems.  Please report such in the category grammar_fa
609       of       the       Tcllib       SF       Trackers       [http://source‐
610       forge.net/tracker/?group_id=12883].   Please  also report any ideas for
611       enhancements you may have for either package and/or documentation.
612

KEYWORDS

614       automaton, finite automaton, grammar, parsing, regular expression, reg‐
615       ular grammar, regular languages, state, transducer
616

COPYRIGHT

618       Copyright (c) 2004-2007 Andreas Kupries <andreas_kupries@users.sourceforge.net>
619
620
621
622
623grammar_fa                            0.3                       grammar::fa(n)