1grammar::fa(n)       Finite automaton operations and usage      grammar::fa(n)
2
3
4
5______________________________________________________________________________
6

NAME

8       grammar::fa - Create and manipulate finite automatons
9

SYNOPSIS

11       package require Tcl  8.4
12
13       package require snit
14
15       package require struct::list
16
17       package require struct::set
18
19       package require grammar::fa::op  ?0.2?
20
21       package require grammar::fa  ?0.2?
22
23       ::grammar::fa faName ?=|:=|<--|as|deserialize src|fromRegex re ?over??
24
25       faName option ?arg arg ...?
26
27       faName destroy
28
29       faName clear
30
31       faName = srcFA
32
33       faName --> dstFA
34
35       faName serialize
36
37       faName deserialize serialization
38
39       faName states
40
41       faName state add s1 ?s2 ...?
42
43       faName state delete s1 ?s2 ...?
44
45       faName state exists s
46
47       faName state rename s snew
48
49       faName startstates
50
51       faName start add s1 ?s2 ...?
52
53       faName start remove s1 ?s2 ...?
54
55       faName start? s
56
57       faName start?set stateset
58
59       faName finalstates
60
61       faName final add s1 ?s2 ...?
62
63       faName final remove s1 ?s2 ...?
64
65       faName final? s
66
67       faName final?set stateset
68
69       faName symbols ?s?
70
71       faName symbolset stateset
72
73       faName symbol add sym1 ?sym2 ...?
74
75       faName symbol delete sym1 ?sym2 ...?
76
77       faName symbol rename sym newsym
78
79       faName symbol exists sym
80
81       faName next s sym ?--> next?
82
83       faName !next s sym ?--> next?
84
85       faName nextset stateset sym
86
87       faName is deterministic
88
89       faName is complete
90
91       faName is useful
92
93       faName is epsilon-free
94
95       faName reachable_states
96
97       faName unreachable_states
98
99       faName reachable s
100
101       faName useful_states
102
103       faName unuseful_states
104
105       faName useful s
106
107       faName epsilon_closure s
108
109       faName reverse
110
111       faName complete
112
113       faName remove_eps
114
115       faName trim ?what?
116
117       faName determinize ?mapvar?
118
119       faName minimize ?mapvar?
120
121       faName complement
122
123       faName kleene
124
125       faName optional
126
127       faName union fa ?mapvar?
128
129       faName intersect fa ?mapvar?
130
131       faName difference fa ?mapvar?
132
133       faName concatenate fa ?mapvar?
134
135       faName fromRegex regex ?over?
136
137_________________________________________________________________
138

DESCRIPTION

140       This  package  provides a container class for finite automatons (Short:
141       FA).  It allows the incremental definition of the automaton, its manip‐
142       ulation  and  querying  of  the definition.  While the package provides
143       complex operations on the automaton (via package  grammar::fa::op),  it
144       does  not have the ability to execute a definition for a stream of sym‐
145       bols.  Use the packages grammar::fa::dacceptor  and  grammar::fa::dexec
146       for that.  Another package related to this is grammar::fa::compiler. It
147       turns a FA into an executor class which has the definition  of  the  FA
148       hardwired into it. The output of this package is configurable to suit a
149       large number of different implementation languages and paradigms.
150
151       For more information about what  a  finite  automaton  is  see  section
152       FINITE AUTOMATONS.
153

API

155       The package exports the API described here.
156
157       ::grammar::fa faName ?=|:=|<--|as|deserialize src|fromRegex re ?over??
158              Creates  a  new  finite  automaton with an associated global Tcl
159              command whose name is faName. This command may be used to invoke
160              various  operations  on the automaton. It has the following gen‐
161              eral form:
162
163              faName option ?arg arg ...?
164                     Option and the args determine the exact behavior  of  the
165                     command.  See  section  FA METHODS for more explanations.
166                     The new automaton will be empty if no src  is  specified.
167                     Otherwise  it  will contain a copy of the definition con‐
168                     tained in the src.  The src has to be a FA object  refer‐
169                     ence  for all operators except deserialize and fromRegex.
170                     The deserialize operator requires src to be  the  serial‐
171                     ization  of  a  FA instead, and fromRegex takes a regular
172                     expression in the form a of a syntax  tree.  See  ::gram‐
173                     mar::fa::op::fromRegex for more detail on that.
174

FA METHODS

176       All automatons provide the following methods for their manipulation:
177
178       faName destroy
179              Destroys  the automaton, including its storage space and associ‐
180              ated command.
181
182       faName clear
183              Clears out the definition of the automaton contained in  faName,
184              but does not destroy the object.
185
186       faName = srcFA
187              Assigns  the  contents  of  the  automaton contained in srcFA to
188              faName,  overwriting  any  existing  definition.   This  is  the
189              assignment operator for automatons. It copies the automaton con‐
190              tained in the FA object srcFA over the automaton  definition  in
191              faName.  The  old  contents of faName are deleted by this opera‐
192              tion.
193
194              This operation is in effect equivalent to
195
196
197                  faName deserialize [srcFA serialize]
198
199
200       faName --> dstFA
201              This is the  reverse  assignment  operator  for  automatons.  It
202              copies  the  automation  contained in the object faName over the
203              automaton definition in the object dstFA.  The old  contents  of
204              dstFA are deleted by this operation.
205
206              This operation is in effect equivalent to
207
208
209                  dstFA deserialize [faName serialize]
210
211
212       faName serialize
213              This  method serializes the automaton stored in faName. In other
214              words it returns a tcl value completely describing that  automa‐
215              ton.   This allows, for example, the transfer of automatons over
216              arbitrary channels, persistence, etc.  This method is  also  the
217              basis for both the copy constructor and the assignment operator.
218
219              The  result of this method has to be semantically identical over
220              all implementations of the grammar::fa interface. This  is  what
221              will  enable us to copy automatons between different implementa‐
222              tions of the same interface.
223
224              The result is a list of three elements with the following struc‐
225              ture:
226
227              [1]    The constant string grammar::fa.
228
229              [2]    A  list  containing the names of all known input symbols.
230                     The order of elements in this list is not relevant.
231
232              [3]    The last item in the list is a  dictionary,  however  the
233                     order  of the keys is important as well. The keys are the
234                     states of the serialized FA, and their order is the order
235                     in which to create the states when deserializing. This is
236                     relevant  to  preserve  the  order  relationship  between
237                     states.
238
239                     The  value  of  each  dictionary entry is a list of three
240                     elements describing the state in more detail.
241
242                     [1]    A boolean flag. If its  value  is  true  then  the
243                            state is a start state, otherwise it is not.
244
245                     [2]    A  boolean  flag.  If  its  value is true then the
246                            state is a final state, otherwise it is not.
247
248                     [3]    The last element is a  dictionary  describing  the
249                            transitions  for  the  state. The keys are symbols
250                            (or the empty string), and the values are sets  of
251                            successor states.
252
253       Assuming  the  following FA (which describes the life of a truck driver
254       in a very simple way :)
255
256
257           Drive -- yellow --> Brake -- red --> (Stop) -- red/yellow --> Attention -- green --> Drive
258           (...) is the start state.
259
260
261       a possible serialization is
262
263
264           grammar::fa \\
265           {yellow red green red/yellow} \\
266           {Drive     {0 0 {yellow     Brake}} \\
267            Brake     {0 0 {red        Stop}} \\
268            Stop      {1 0 {red/yellow Attention}} \\
269            Attention {0 0 {green      Drive}}}
270
271
272       A possible one, because I did not care about creation order here
273
274       faName deserialize serialization
275              This is the complement to serialize. It replaces  the  automaton
276              definition in faName with the automaton described by the serial‐
277              ization value. The old contents of faName are  deleted  by  this
278              operation.
279
280       faName states
281              Returns the set of all states known to faName.
282
283       faName state add s1 ?s2 ...?
284              Adds  the  states  s1,  s2,  et  cetera  to the FA definition in
285              faName. The operation will fail any of the new states is already
286              declared.
287
288       faName state delete s1 ?s2 ...?
289              Deletes the state s1, s2, et cetera, and all associated informa‐
290              tion from the FA definition in faName. The latter means that the
291              information  about  in-  or  outbound  transitions is deleted as
292              well. If the deleted state was a start or final state then  this
293              information  is  invalidated as well. The operation will fail if
294              the state s is not known to the FA.
295
296       faName state exists s
297              A predicate. It tests whether the state s is known to the FA  in
298              faName.   The  result is a boolean value. It will be set to true
299              if the state s is known, and false otherwise.
300
301       faName state rename s snew
302              Renames the state s to snew. Fails if s is not  a  known  state.
303              Also fails if snew is already known as a state.
304
305       faName startstates
306              Returns the set of states which are marked as start states, also
307              known as initial states.  See FINITE AUTOMATONS for explanations
308              what this means.
309
310       faName start add s1 ?s2 ...?
311              Mark the states s1, s2, et cetera in the FA faName as start (aka
312              initial).
313
314       faName start remove s1 ?s2 ...?
315              Mark the states s1, s2, et cetera in the FA faName as not  start
316              (aka not accepting).
317
318       faName start? s
319              A  predicate.  It tests if the state s in the FA faName is start
320              or not.  The result is a boolean value. It will be set  to  true
321              if the state s is start, and false otherwise.
322
323       faName start?set stateset
324              A  predicate. It tests if the set of states stateset contains at
325              least one start state. They operation will fail if the set  con‐
326              tains  an  element  which is not a known state.  The result is a
327              boolean value. It will be set  to  true  if  a  start  state  is
328              present in stateset, and false otherwise.
329
330       faName finalstates
331              Returns the set of states which are marked as final states, also
332              known as accepting states.  See FINITE AUTOMATONS  for  explana‐
333              tions what this means.
334
335       faName final add s1 ?s2 ...?
336              Mark the states s1, s2, et cetera in the FA faName as final (aka
337              accepting).
338
339       faName final remove s1 ?s2 ...?
340              Mark the states s1, s2, et cetera in the FA faName as not  final
341              (aka not accepting).
342
343       faName final? s
344              A  predicate.  It tests if the state s in the FA faName is final
345              or not.  The result is a boolean value. It will be set  to  true
346              if the state s is final, and false otherwise.
347
348       faName final?set stateset
349              A  predicate. It tests if the set of states stateset contains at
350              least one final state. They operation will fail if the set  con‐
351              tains  an  element  which is not a known state.  The result is a
352              boolean value. It will be set  to  true  if  a  final  state  is
353              present in stateset, and false otherwise.
354
355       faName symbols ?s?
356              If  no  state s is specified then it returns the set of all sym‐
357              bols known to the FA faName. Otherwise it returns the set of all
358              symbols for which the state s has transitions. If the empty sym‐
359              bol is present then s has epsilon transitions.
360
361       faName symbolset stateset
362              Returns the set of all symbols for which at least one  state  in
363              the set of states stateset has transitions.  In other words, the
364              union of [faName symbols s] for all states s  in  stateset.   If
365              the empty symbol is present then at least one state contained in
366              stateset has epsilon transitions.
367
368       faName symbol add sym1 ?sym2 ...?
369              Adds the symbols sym1, sym2, et cetera to the FA  definition  in
370              faName.  The  operation  will fail any of the symbols is already
371              declared. The empty string is not allowed as  a  value  for  the
372              symbols.
373
374       faName symbol delete sym1 ?sym2 ...?
375              Deletes  the  symbols  sym1,  sym2 et cetera, and all associated
376              information from the FA definition in faName. The  latter  means
377              that  all transitions using the symbols are deleted as well. The
378              operation will fail if any of the symbols is not  known  to  the
379              FA.
380
381       faName symbol rename sym newsym
382              Renames  the  symbol  sym to newsym. Fails if sym is not a known
383              symbol. Also fails if newsym is already known as a symbol.
384
385       faName symbol exists sym
386              A predicate. It tests whether the symbol sym is known to the  FA
387              in  faName.   The  result  is a boolean value. It will be set to
388              true if the symbol sym is known, and false otherwise.
389
390       faName next s sym ?--> next?
391              Define or query transition information.
392
393              If next is specified, then the method will add a transition from
394              the  state s to the successor state next labeled with the symbol
395              sym to the FA contained in faName. The operation will fail if s,
396              or  next  are not known states, or if sym is not a known symbol.
397              An exception to the latter is that sym  is  allowed  to  be  the
398              empty  string.  In  that  case  the new transition is an epsilon
399              transition which will not  consume  input  when  traversed.  The
400              operation  will  also  fail  if  the combination of (s, sym, and
401              next) is already present in the FA.
402
403              If next was not specified, then the method will return  the  set
404              of  states  which can be reached from s through a single transi‐
405              tion labeled with symbol sym.
406
407       faName !next s sym ?--> next?
408              Remove one or more transitions from the Fa in faName.
409
410              If next was specified then the single transition from the  state
411              s  to the state next labeled with the symbol sym is removed from
412              the FA. Otherwise all transitions originating  in  state  s  and
413              labeled with the symbol sym will be removed.
414
415              The  operation  will  fail  if  s  and/or  next are not known as
416              states. It will also fail if a non-empty sym  is  not  known  as
417              symbol.  The  empty string is acceptable, and allows the removal
418              of epsilon transitions.
419
420       faName nextset stateset sym
421              Returns the set of states which can be reached by a single tran‐
422              sition  originating  in  a state in the set stateset and labeled
423              with the symbol sym.
424
425              In other words, this is the union of [faName next s symbol]  for
426              all states s in stateset.
427
428       faName is deterministic
429              A  predicate. It tests whether the FA in faName is a determinis‐
430              tic FA or not.  The result is a boolean value. It will be set to
431              true if the FA is deterministic, and false otherwise.
432
433       faName is complete
434              A  predicate. It tests whether the FA in faName is a complete FA
435              or not. A FA is complete if it has at least one  transition  per
436              state  and symbol. This also means that a FA without symbols, or
437              states is also complete.  The result is a boolean value. It will
438              be set to true if the FA is deterministic, and false otherwise.
439
440              Note:  When a FA has epsilon-transitions transitions over a sym‐
441              bol for a state S can be indirect, i.e. not attached directly to
442              S,  but  to a state in the epsilon-closure of S. The symbols for
443              such indirect transitions count when computing completeness.
444
445       faName is useful
446              A predicate. It tests whether the FA in faName is an  useful  FA
447              or  not.  A FA is useful if all states are reachable and useful.
448              The result is a boolean value. It will be set to true if the  FA
449              is deterministic, and false otherwise.
450
451       faName is epsilon-free
452              A  predicate.  It  tests whether the FA in faName is an epsilon-
453              free FA or not. A FA is epsilon-free if it has no epsilon  tran‐
454              sitions.  This  definition  means that all deterministic FAs are
455              epsilon-free as well, and epsilon-freeness is a  necessary  pre-
456              condition  for  deterministic'ness.   The  result  is  a boolean
457              value. It will be set to true if the FA  is  deterministic,  and
458              false otherwise.
459
460       faName reachable_states
461              Returns the set of states which are reachable from a start state
462              by one or more transitions.
463
464       faName unreachable_states
465              Returns the set of states which are not reachable from any start
466              state by any number of transitions. This is
467
468
469                    [faName states] - [faName reachable_states]
470
471
472       faName reachable s
473              A  predicate.  It tests whether the state s in the FA faName can
474              be reached from a start state by one or more  transitions.   The
475              result  is  a boolean value. It will be set to true if the state
476              can be reached, and false otherwise.
477
478       faName useful_states
479              Returns the set of states which are able to reach a final  state
480              by one or more transitions.
481
482       faName unuseful_states
483              Returns  the  set  of states which are not able to reach a final
484              state by any number of transitions. This is
485
486
487                    [faName states] - [faName useful_states]
488
489
490       faName useful s
491              A predicate. It tests whether the state s in the  FA  faName  is
492              able  to  reach  a  final state by one or more transitions.  The
493              result is a boolean value. It will be set to true if  the  state
494              is useful, and false otherwise.
495
496       faName epsilon_closure s
497              Returns  the  set of states which are reachable from the state s
498              in the FA faName by one or more epsilon transitions, i.e transi‐
499              tions  over  the  empty symbol, transitions which do not consume
500              input. This is called the epsilon closure of s.
501
502       faName reverse
503
504       faName complete
505
506       faName remove_eps
507
508       faName trim ?what?
509
510       faName determinize ?mapvar?
511
512       faName minimize ?mapvar?
513
514       faName complement
515
516       faName kleene
517
518       faName optional
519
520       faName union fa ?mapvar?
521
522       faName intersect fa ?mapvar?
523
524       faName difference fa ?mapvar?
525
526       faName concatenate fa ?mapvar?
527
528       faName fromRegex regex ?over?
529              These methods provide more complex operations on the FA.  Please
530              see  the  same-named commands in the package grammar::fa::op for
531              descriptions of what they do.
532

EXAMPLES

FINITE AUTOMATONS

535       For the mathematically inclined, a FA is a 5-tuple (S,Sy,St,Fi,T) where
536
537       ·      S is a set of states,
538
539       ·      Sy a set of input symbols,
540
541       ·      St is a subset of S, the set of start states, also known as ini‐
542              tial states.
543
544       ·      Fi  is  a  subset  of  S, the set of final states, also known as
545              accepting.
546
547       ·      T is a function from S x (Sy + epsilon) to {S},  the  transition
548              function.   Here  epsilon  denotes the empty input symbol and is
549              distinct from all symbols in Sy; and {S} is the set  of  subsets
550              of  S.  In  other words, T maps a combination of State and Input
551              (which can be empty) to a set of successor states.
552
553       In computer theory a FA is most often shown as a graph where the  nodes
554       represent  the states, and the edges between the nodes encode the tran‐
555       sition function: For all n in S' = T (s, sy) we have one  edge  between
556       the  nodes  representing  s and n resp., labeled with sy. The start and
557       accepting states are encoded through distinct visual markers, i.e. they
558       are attributes of the nodes.
559
560       FA's are used to process streams of symbols over Sy.
561
562       A  specific  FA is said to accept a finite stream sy_1 sy_2 state in St
563       and ending at a state in Fi whose edges have  the  labels  sy_1,  sy_2,
564       etc.  to  sy_n.   The set of all strings accepted by the FA is the lan‐
565       guage of the FA. One important equivalence is that the set of languages
566       which can be accepted by an FA is the set of regular languages.
567
568       Another important concept is that of deterministic FAs. A FA is said to
569       be deterministic if for each string of input symbols there  is  exactly
570       one  path in the graph of the FA beginning at the start state and whose
571       edges are labeled with the symbols in the string.  While it might  seem
572       that  non-deterministic  FAs to have more power of recognition, this is
573       not so. For each non-deterministic FA we can construct a  deterministic
574       FA  which  accepts  the  same language (--> Thompson's subset construc‐
575       tion).
576
577       While one of the premier applications of FAs is in parsing,  especially
578       in  the lexer stage (where symbols == characters), this is not the only
579       possibility by far.
580
581       Quite a lot of processes can be modeled as a FA, albeit with a possibly
582       large  set of states. For these the notion of accepting states is often
583       less or not relevant at all. What is needed instead is the  ability  to
584       act  to  state  changes  in  the  FA,  i.e.  to generate some output in
585       response to the input.  This transforms a FA into a finite  transducer,
586       which  has  an  additional  set OSy of output symbols and also an addi‐
587       tional output function O which maps from "S x (Sy + epsilon)" to  "(Osy
588       + epsilon)", i.e a combination of state and input, possibly empty to an
589       output symbol, or nothing.
590
591       For the graph representation  this  means  that  edges  are  additional
592       labeled  with  the  output  symbol to write when this edge is traversed
593       while matching input. Note that for an application "writing  an  output
594       symbol" can also be "executing some code".
595
596       Transducers  are  not  handled by this package. They will get their own
597       package in the future.
598

KEYWORDS

600       automaton, finite automaton, grammar, parsing, regular expression, reg‐
601       ular grammar, regular languages, state, transducer
602
604       Copyright (c) 2004 Andreas Kupries <andreas_kupries@users.sourceforge.net>
605
606
607
608
609grammar_fa                            0.2                       grammar::fa(n)
Impressum