1grammar::fa(n)       Finite automaton operations and usage      grammar::fa(n)
2
3
4
5______________________________________________________________________________
6

NAME

8       grammar::fa - Create and manipulate finite automatons
9

SYNOPSIS

11       package require Tcl  8.4
12
13       package require snit  1.3
14
15       package require struct::list
16
17       package require struct::set
18
19       package require grammar::fa::op  ?0.2?
20
21       package require grammar::fa  ?0.4?
22
23       ::grammar::fa faName ?=|:=|<--|as|deserialize src|fromRegex re ?over??
24
25       faName option ?arg arg ...?
26
27       faName destroy
28
29       faName clear
30
31       faName = srcFA
32
33       faName --> dstFA
34
35       faName serialize
36
37       faName deserialize serialization
38
39       faName states
40
41       faName state add s1 ?s2 ...?
42
43       faName state delete s1 ?s2 ...?
44
45       faName state exists s
46
47       faName state rename s snew
48
49       faName startstates
50
51       faName start add s1 ?s2 ...?
52
53       faName start remove s1 ?s2 ...?
54
55       faName start? s
56
57       faName start?set stateset
58
59       faName finalstates
60
61       faName final add s1 ?s2 ...?
62
63       faName final remove s1 ?s2 ...?
64
65       faName final? s
66
67       faName final?set stateset
68
69       faName symbols
70
71       faName symbols@ s ?d?
72
73       faName symbols@set stateset
74
75       faName symbol add sym1 ?sym2 ...?
76
77       faName symbol delete sym1 ?sym2 ...?
78
79       faName symbol rename sym newsym
80
81       faName symbol exists sym
82
83       faName next s sym ?--> next?
84
85       faName !next s sym ?--> next?
86
87       faName nextset stateset sym
88
89       faName is deterministic
90
91       faName is complete
92
93       faName is useful
94
95       faName is epsilon-free
96
97       faName reachable_states
98
99       faName unreachable_states
100
101       faName reachable s
102
103       faName useful_states
104
105       faName unuseful_states
106
107       faName useful s
108
109       faName epsilon_closure s
110
111       faName reverse
112
113       faName complete
114
115       faName remove_eps
116
117       faName trim ?what?
118
119       faName determinize ?mapvar?
120
121       faName minimize ?mapvar?
122
123       faName complement
124
125       faName kleene
126
127       faName optional
128
129       faName union fa ?mapvar?
130
131       faName intersect fa ?mapvar?
132
133       faName difference fa ?mapvar?
134
135       faName concatenate fa ?mapvar?
136
137       faName fromRegex regex ?over?
138
139______________________________________________________________________________
140

DESCRIPTION

142       This  package  provides a container class for finite automatons (Short:
143       FA).  It allows the incremental definition of the automaton, its manip‐
144       ulation  and  querying  of  the definition.  While the package provides
145       complex operations on the automaton (via package  grammar::fa::op),  it
146       does  not have the ability to execute a definition for a stream of sym‐
147       bols.  Use the packages grammar::fa::dacceptor  and  grammar::fa::dexec
148       for that.  Another package related to this is grammar::fa::compiler. It
149       turns a FA into an executor class which has the definition  of  the  FA
150       hardwired into it. The output of this package is configurable to suit a
151       large number of different implementation languages and paradigms.
152
153       For more information about what a finite automaton is see  section  FI‐
154       NITE AUTOMATONS.
155

API

157       The package exports the API described here.
158
159       ::grammar::fa faName ?=|:=|<--|as|deserialize src|fromRegex re ?over??
160              Creates  a  new  finite  automaton with an associated global Tcl
161              command whose name is faName. This command may be used to invoke
162              various  operations  on the automaton. It has the following gen‐
163              eral form:
164
165              faName option ?arg arg ...?
166                     Option and the args determine the exact behavior  of  the
167                     command.  See  section  FA METHODS for more explanations.
168                     The new automaton will be empty if no src  is  specified.
169                     Otherwise  it  will contain a copy of the definition con‐
170                     tained in the src.  The src has to be a FA object  refer‐
171                     ence  for all operators except deserialize and fromRegex.
172                     The deserialize operator requires src to be  the  serial‐
173                     ization  of  a  FA instead, and fromRegex takes a regular
174                     expression in the form a of a syntax  tree.  See  ::gram‐
175                     mar::fa::op::fromRegex for more detail on that.
176

FA METHODS

178       All automatons provide the following methods for their manipulation:
179
180       faName destroy
181              Destroys  the automaton, including its storage space and associ‐
182              ated command.
183
184       faName clear
185              Clears out the definition of the automaton contained in  faName,
186              but does not destroy the object.
187
188       faName = srcFA
189              Assigns  the  contents  of  the  automaton contained in srcFA to
190              faName, overwriting any existing definition.  This  is  the  as‐
191              signment  operator  for automatons. It copies the automaton con‐
192              tained in the FA object srcFA over the automaton  definition  in
193              faName.  The  old  contents of faName are deleted by this opera‐
194              tion.
195
196              This operation is in effect equivalent to
197
198
199
200                  faName deserialize [srcFA serialize]
201
202
203       faName --> dstFA
204              This is the  reverse  assignment  operator  for  automatons.  It
205              copies  the  automation  contained in the object faName over the
206              automaton definition in the object dstFA.  The old  contents  of
207              dstFA are deleted by this operation.
208
209              This operation is in effect equivalent to
210
211
212
213                  dstFA deserialize [faName serialize]
214
215
216       faName serialize
217              This  method serializes the automaton stored in faName. In other
218              words it returns a tcl value completely describing that  automa‐
219              ton.   This allows, for example, the transfer of automatons over
220              arbitrary channels, persistence, etc.  This method is  also  the
221              basis for both the copy constructor and the assignment operator.
222
223              The  result of this method has to be semantically identical over
224              all implementations of the grammar::fa interface. This  is  what
225              will  enable us to copy automatons between different implementa‐
226              tions of the same interface.
227
228              The result is a list of three elements with the following struc‐
229              ture:
230
231              [1]    The constant string grammar::fa.
232
233              [2]    A  list  containing the names of all known input symbols.
234                     The order of elements in this list is not relevant.
235
236              [3]    The last item in the list is a  dictionary,  however  the
237                     order  of the keys is important as well. The keys are the
238                     states of the serialized FA, and their order is the order
239                     in which to create the states when deserializing. This is
240                     relevant  to  preserve  the  order  relationship  between
241                     states.
242
243                     The value of each dictionary entry is a list of three el‐
244                     ements describing the state in more detail.
245
246                     [1]    A boolean flag. If its  value  is  true  then  the
247                            state is a start state, otherwise it is not.
248
249                     [2]    A  boolean  flag.  If  its  value is true then the
250                            state is a final state, otherwise it is not.
251
252                     [3]    The last element is a  dictionary  describing  the
253                            transitions  for  the  state. The keys are symbols
254                            (or the empty string), and the values are sets  of
255                            successor states.
256
257       Assuming  the  following FA (which describes the life of a truck driver
258       in a very simple way :)
259
260
261
262                  Drive -- yellow --> Brake -- red --> (Stop) -- red/yellow --> Attention -- green --> Drive
263                  (...) is the start state.
264
265
266       a possible serialization is
267
268
269
270                  grammar::fa \
271                  {yellow red green red/yellow} \
272                  {Drive     {0 0 {yellow     Brake}} \
273                   Brake     {0 0 {red        Stop}} \
274                   Stop      {1 0 {red/yellow Attention}} \
275                   Attention {0 0 {green      Drive}}}
276
277
278       A possible one, because I did not care about creation order here
279
280       faName deserialize serialization
281              This is the complement to serialize. It replaces  the  automaton
282              definition in faName with the automaton described by the serial‐
283              ization value. The old contents of faName are  deleted  by  this
284              operation.
285
286       faName states
287              Returns the set of all states known to faName.
288
289       faName state add s1 ?s2 ...?
290              Adds  the  states  s1,  s2,  et  cetera  to the FA definition in
291              faName. The operation will fail any of the new states is already
292              declared.
293
294       faName state delete s1 ?s2 ...?
295              Deletes the state s1, s2, et cetera, and all associated informa‐
296              tion from the FA definition in faName. The latter means that the
297              information  about  in-  or  outbound  transitions is deleted as
298              well. If the deleted state was a start or final state then  this
299              information  is  invalidated as well. The operation will fail if
300              the state s is not known to the FA.
301
302       faName state exists s
303              A predicate. It tests whether the state s is known to the FA  in
304              faName.   The  result is a boolean value. It will be set to true
305              if the state s is known, and false otherwise.
306
307       faName state rename s snew
308              Renames the state s to snew. Fails if s is not  a  known  state.
309              Also fails if snew is already known as a state.
310
311       faName startstates
312              Returns the set of states which are marked as start states, also
313              known as initial states.  See FINITE AUTOMATONS for explanations
314              what this means.
315
316       faName start add s1 ?s2 ...?
317              Mark the states s1, s2, et cetera in the FA faName as start (aka
318              initial).
319
320       faName start remove s1 ?s2 ...?
321              Mark the states s1, s2, et cetera in the FA faName as not  start
322              (aka not accepting).
323
324       faName start? s
325              A  predicate.  It tests if the state s in the FA faName is start
326              or not.  The result is a boolean value. It will be set  to  true
327              if the state s is start, and false otherwise.
328
329       faName start?set stateset
330              A  predicate. It tests if the set of states stateset contains at
331              least one start state. They operation will fail if the set  con‐
332              tains  an  element  which is not a known state.  The result is a
333              boolean value. It will be set  to  true  if  a  start  state  is
334              present in stateset, and false otherwise.
335
336       faName finalstates
337              Returns the set of states which are marked as final states, also
338              known as accepting states.  See FINITE AUTOMATONS  for  explana‐
339              tions what this means.
340
341       faName final add s1 ?s2 ...?
342              Mark the states s1, s2, et cetera in the FA faName as final (aka
343              accepting).
344
345       faName final remove s1 ?s2 ...?
346              Mark the states s1, s2, et cetera in the FA faName as not  final
347              (aka not accepting).
348
349       faName final? s
350              A  predicate.  It tests if the state s in the FA faName is final
351              or not.  The result is a boolean value. It will be set  to  true
352              if the state s is final, and false otherwise.
353
354       faName final?set stateset
355              A  predicate. It tests if the set of states stateset contains at
356              least one final state. They operation will fail if the set  con‐
357              tains  an  element  which is not a known state.  The result is a
358              boolean value. It will be set  to  true  if  a  final  state  is
359              present in stateset, and false otherwise.
360
361       faName symbols
362              Returns the set of all symbols known to the FA faName.
363
364       faName symbols@ s ?d?
365              Returns the set of all symbols for which the state s has transi‐
366              tions.  If the empty symbol is present then s has epsilon  tran‐
367              sitions.  If  two  states are specified the result is the set of
368              symbols which have transitions from s to  t.  This  set  may  be
369              empty  if  there  are  no  transitions between the two specified
370              states.
371
372       faName symbols@set stateset
373              Returns the set of all symbols for which at least one  state  in
374              the set of states stateset has transitions.  In other words, the
375              union of [faName symbols@ s] for all states s in  stateset.   If
376              the empty symbol is present then at least one state contained in
377              stateset has epsilon transitions.
378
379       faName symbol add sym1 ?sym2 ...?
380              Adds the symbols sym1, sym2, et cetera to the FA  definition  in
381              faName.  The  operation  will fail any of the symbols is already
382              declared. The empty string is not allowed as  a  value  for  the
383              symbols.
384
385       faName symbol delete sym1 ?sym2 ...?
386              Deletes the symbols sym1, sym2 et cetera, and all associated in‐
387              formation from the FA definition in  faName.  The  latter  means
388              that  all transitions using the symbols are deleted as well. The
389              operation will fail if any of the symbols is not  known  to  the
390              FA.
391
392       faName symbol rename sym newsym
393              Renames  the  symbol  sym to newsym. Fails if sym is not a known
394              symbol. Also fails if newsym is already known as a symbol.
395
396       faName symbol exists sym
397              A predicate. It tests whether the symbol sym is known to the  FA
398              in  faName.   The  result  is a boolean value. It will be set to
399              true if the symbol sym is known, and false otherwise.
400
401       faName next s sym ?--> next?
402              Define or query transition information.
403
404              If next is specified, then the method will add a transition from
405              the  state s to the successor state next labeled with the symbol
406              sym to the FA contained in faName. The operation will fail if s,
407              or  next  are not known states, or if sym is not a known symbol.
408              An exception to the latter is that sym  is  allowed  to  be  the
409              empty  string.  In  that  case  the new transition is an epsilon
410              transition which will not consume input when traversed. The  op‐
411              eration  will also fail if the combination of (s, sym, and next)
412              is already present in the FA.
413
414              If next was not specified, then the method will return  the  set
415              of  states  which can be reached from s through a single transi‐
416              tion labeled with symbol sym.
417
418       faName !next s sym ?--> next?
419              Remove one or more transitions from the Fa in faName.
420
421              If next was specified then the single transition from the  state
422              s  to the state next labeled with the symbol sym is removed from
423              the FA. Otherwise all transitions originating in state s and la‐
424              beled with the symbol sym will be removed.
425
426              The  operation  will  fail  if  s  and/or  next are not known as
427              states. It will also fail if a non-empty sym  is  not  known  as
428              symbol.  The  empty string is acceptable, and allows the removal
429              of epsilon transitions.
430
431       faName nextset stateset sym
432              Returns the set of states which can be reached by a single tran‐
433              sition  originating  in  a state in the set stateset and labeled
434              with the symbol sym.
435
436              In other words, this is the union of [faName next s symbol]  for
437              all states s in stateset.
438
439       faName is deterministic
440              A  predicate. It tests whether the FA in faName is a determinis‐
441              tic FA or not.  The result is a boolean value. It will be set to
442              true if the FA is deterministic, and false otherwise.
443
444       faName is complete
445              A  predicate. It tests whether the FA in faName is a complete FA
446              or not. A FA is complete if it has at least one  transition  per
447              state  and symbol. This also means that a FA without symbols, or
448              states is also complete.  The result is a boolean value. It will
449              be set to true if the FA is deterministic, and false otherwise.
450
451              Note:  When a FA has epsilon-transitions transitions over a sym‐
452              bol for a state S can be indirect, i.e. not attached directly to
453              S,  but  to a state in the epsilon-closure of S. The symbols for
454              such indirect transitions count when computing completeness.
455
456       faName is useful
457              A predicate. It tests whether the FA in faName is an  useful  FA
458              or  not.  A FA is useful if all states are reachable and useful.
459              The result is a boolean value. It will be set to true if the  FA
460              is deterministic, and false otherwise.
461
462       faName is epsilon-free
463              A  predicate.  It  tests whether the FA in faName is an epsilon-
464              free FA or not. A FA is epsilon-free if it has no epsilon  tran‐
465              sitions.  This  definition  means that all deterministic FAs are
466              epsilon-free as well, and epsilon-freeness is a  necessary  pre-
467              condition  for  deterministic'ness.   The  result  is  a boolean
468              value. It will be set to true if the FA  is  deterministic,  and
469              false otherwise.
470
471       faName reachable_states
472              Returns the set of states which are reachable from a start state
473              by one or more transitions.
474
475       faName unreachable_states
476              Returns the set of states which are not reachable from any start
477              state by any number of transitions. This is
478
479
480
481                 [faName states] - [faName reachable_states]
482
483
484       faName reachable s
485              A  predicate.  It tests whether the state s in the FA faName can
486              be reached from a start state by one or more  transitions.   The
487              result  is  a boolean value. It will be set to true if the state
488              can be reached, and false otherwise.
489
490       faName useful_states
491              Returns the set of states which are able to reach a final  state
492              by one or more transitions.
493
494       faName unuseful_states
495              Returns  the  set  of states which are not able to reach a final
496              state by any number of transitions. This is
497
498
499
500                 [faName states] - [faName useful_states]
501
502
503       faName useful s
504              A predicate. It tests whether the state s in the  FA  faName  is
505              able to reach a final state by one or more transitions.  The re‐
506              sult is a boolean value. It will be set to true if the state  is
507              useful, and false otherwise.
508
509       faName epsilon_closure s
510              Returns  the  set of states which are reachable from the state s
511              in the FA faName by one or more epsilon transitions, i.e transi‐
512              tions  over  the  empty symbol, transitions which do not consume
513              input. This is called the epsilon closure of s.
514
515       faName reverse
516
517       faName complete
518
519       faName remove_eps
520
521       faName trim ?what?
522
523       faName determinize ?mapvar?
524
525       faName minimize ?mapvar?
526
527       faName complement
528
529       faName kleene
530
531       faName optional
532
533       faName union fa ?mapvar?
534
535       faName intersect fa ?mapvar?
536
537       faName difference fa ?mapvar?
538
539       faName concatenate fa ?mapvar?
540
541       faName fromRegex regex ?over?
542              These methods provide more complex operations on the FA.  Please
543              see  the  same-named commands in the package grammar::fa::op for
544              descriptions of what they do.
545

EXAMPLES

FINITE AUTOMATONS

548       For the mathematically inclined, a FA is a 5-tuple (S,Sy,St,Fi,T) where
549
550       •      S is a set of states,
551
552       •      Sy a set of input symbols,
553
554       •      St is a subset of S, the set of start states, also known as ini‐
555              tial states.
556
557       •      Fi  is a subset of S, the set of final states, also known as ac‐
558              cepting.
559
560       •      T is a function from S x (Sy + epsilon) to {S},  the  transition
561              function.   Here  epsilon  denotes the empty input symbol and is
562              distinct from all symbols in Sy; and {S} is the set  of  subsets
563              of  S.  In  other words, T maps a combination of State and Input
564              (which can be empty) to a set of successor states.
565
566       In computer theory a FA is most often shown as a graph where the  nodes
567       represent  the states, and the edges between the nodes encode the tran‐
568       sition function: For all n in S' = T (s, sy) we have one  edge  between
569       the  nodes  representing  s and n resp., labeled with sy. The start and
570       accepting states are encoded through distinct visual markers, i.e. they
571       are attributes of the nodes.
572
573       FA's are used to process streams of symbols over Sy.
574
575       A  specific  FA is said to accept a finite stream sy_1 sy_2 ... sy_n if
576       there is a path in the graph of the FA beginning at a state in  St  and
577       ending at a state in Fi whose edges have the labels sy_1, sy_2, etc. to
578       sy_n.  The set of all strings accepted by the FA is the language of the
579       FA. One important equivalence is that the set of languages which can be
580       accepted by an FA is the set of regular languages.
581
582       Another important concept is that of deterministic FAs. A FA is said to
583       be  deterministic  if for each string of input symbols there is exactly
584       one path in the graph of the FA beginning at the start state and  whose
585       edges  are labeled with the symbols in the string.  While it might seem
586       that non-deterministic FAs to have more power of recognition,  this  is
587       not  so. For each non-deterministic FA we can construct a deterministic
588       FA which accepts the same language  (-->  Thompson's  subset  construc‐
589       tion).
590
591       While  one of the premier applications of FAs is in parsing, especially
592       in the lexer stage (where symbols == characters), this is not the  only
593       possibility by far.
594
595       Quite a lot of processes can be modeled as a FA, albeit with a possibly
596       large set of states. For these the notion of accepting states is  often
597       less  or  not relevant at all. What is needed instead is the ability to
598       act to state changes in the FA, i.e. to generate  some  output  in  re‐
599       sponse  to  the  input.  This transforms a FA into a finite transducer,
600       which has an additional set OSy of output symbols  and  also  an  addi‐
601       tional  output function O which maps from "S x (Sy + epsilon)" to "(Osy
602       + epsilon)", i.e a combination of state and input, possibly empty to an
603       output symbol, or nothing.
604
605       For  the  graph representation this means that edges are additional la‐
606       beled with the output symbol to write when this edge is traversed while
607       matching input. Note that for an application "writing an output symbol"
608       can also be "executing some code".
609
610       Transducers are not handled by this package. They will  get  their  own
611       package in the future.
612

BUGS, IDEAS, FEEDBACK

614       This  document,  and the package it describes, will undoubtedly contain
615       bugs and other problems.  Please report such in the category grammar_fa
616       of  the Tcllib Trackers [http://core.tcl.tk/tcllib/reportlist].  Please
617       also report any ideas for enhancements you may have for either  package
618       and/or documentation.
619
620       When proposing code changes, please provide unified diffs, i.e the out‐
621       put of diff -u.
622
623       Note further that  attachments  are  strongly  preferred  over  inlined
624       patches.  Attachments  can  be  made  by  going to the Edit form of the
625       ticket immediately after its creation, and  then  using  the  left-most
626       button in the secondary navigation bar.
627

KEYWORDS

629       automaton, finite automaton, grammar, parsing, regular expression, reg‐
630       ular grammar, regular languages, state, transducer
631

CATEGORY

633       Grammars and finite automata
634
636       Copyright (c) 2004-2009 Andreas Kupries <andreas_kupries@users.sourceforge.net>
637
638
639
640
641tcllib                                0.4                       grammar::fa(n)
Impressum