1
2
3RAGEL(1)                 Ragel State Machine Compiler                 RAGEL(1)
4
5
6

NAME

8       ragel - compile regular languages into executable state machines
9

SYNOPSIS

11       ragel [options] file
12

DESCRIPTION

14       Ragel compiles executable finite state machines from regular languages.
15       Ragel can generate C, C++, Objective-C, D, or Java  code.  Ragel  state
16       machines  can  not  only recognize byte sequences as regular expression
17       machines do, but can also execute  code  at  arbitrary  points  in  the
18       recognition  of a regular language.  User code is embedded using inline
19       operators that do not disrupt the regular language syntax.
20
21       The core language consists of standard  regular  expression  operators,
22       such  as  union,  concatenation  and kleene star, accompanied by action
23       embedding operators. Ragel also provides operators that let you control
24       any non-determinism that you create, construct scanners using the long‐
25       est match paradigm, and  build  state  machines  using  the  statechart
26       model.  It  is  also  possible  to  influence  the execution of a state
27       machine from inside an embedded action by jumping or calling  to  other
28       parts of the machine and reprocessing input.
29
30       Ragel  provides  a  very  flexibile interface to the host language that
31       attempts to place minimal restrictions on how  the  generated  code  is
32       used  and  integrated  into  the application. The generated code has no
33       dependencies.
34
35

OPTIONS

37       -h, -H, -?, --help
38              Display help and exit.
39
40       -v     Print version information and exit.
41
42       -o  file
43              Write output to file. If -o is not given, a default file name is
44              chosen  by  replacing the file extenstion of the input file. For
45              source files ending in .rh the suffix .h is used. For all  other
46              source  files a suffix based on the output language is used (.c,
47              .cpp, .m, etc.). If -o is not given for Graphviz output the gen‐
48              erated dot file is written to standard output.
49
50       -s     Print some statistics on standard error.
51
52       --error-format=gnu
53              Print   error  messages  using  the  format  "file:line:column:"
54              (default)
55
56       --error-format=msvc
57              Print error messages using the format "file(line,column):"
58
59       -d     Do not remove duplicate actions from action lists.
60
61       -I  dir
62              Add dir to the list of directories to search  for  included  and
63              imported files
64
65       -n     Do not perform state minimization.
66
67       -m     Perform  minimization once, at the end of the state machine com‐
68              pilation.
69
70       -l     Minimize after nearly every operation. Lists of like  operations
71              such  as  unions  are  minimized  once  at  the end. This is the
72              default minimization option.
73
74       -e     Minimize after every operation.
75
76       -x     Compile the state machines and emit an XML representation of the
77              host data and the machines.
78
79       -V     Generate a dot file for Graphviz.
80
81       -p     Display printable characters on labels.
82
83       -S <spec>
84              FSM specification to output.
85
86       -M <machine>
87              Machine definition/instantiation to output.
88
89       -C     The  host  language  is  C,  C++,  Obj-C or Obj-C++. This is the
90              default host language option.
91
92       -D     The host language is D.
93
94       -J     The host language is Java.
95
96       -R     The host language is Ruby.
97
98       -L     Inhibit writing of #line directives.
99
100       -T0    (C/D/Java/Ruby/C#) Generate a table  driven  FSM.  This  is  the
101              default  code  style.  The table driven FSM represents the state
102              machine as static data. There are tables of states, transitions,
103              indicies and actions. The current state is stored in a variable.
104              The execution is a loop that looks that given the current  state
105              and current character to process looks up the transition to take
106              using a binary search, executes any actions  and  moves  to  the
107              target  state.  In  general,  the  table  driven  FSM produces a
108              smaller binary and requires a less expensive host language  com‐
109              pile but results in slower running code. The table driven FSM is
110              suitable for any FSM.
111
112       -T1    (C/D/Ruby/C#) Generate a faster table driven  FSM  by  expanding
113              action lists in the action execute code.
114
115       -F0    (C/D/Ruby/C#)  Generate a flat table driven FSM. Transitions are
116              represented as an array indexed by the current alphabet  charac‐
117              ter.  This  eliminates  the  need  for a binary search to locate
118              transitions and produces faster code, however it is  only  suit‐
119              able for small alphabets.
120
121       -F1    (C/D/Ruby/C#) Generate a faster flat table driven FSM by expand‐
122              ing action lists in the action execute code.
123
124       -G0    (C/D/C#) Generate a goto driven FSM. The goto driven FSM  repre‐
125              sents the state machine as a series of goto statements. While in
126              the machine, the current state  is  stored  by  the  processor's
127              instruction pointer. The execution is a flat function where con‐
128              trol is passed from state to state using gotos. In general,  the
129              goto FSM produces faster code but results in a larger binary and
130              a more expensive host language compile.
131
132       -G1    (C/D/C#) Generate a faster goto driven FSM by  expanding  action
133              lists in the action execute code.
134
135       -G2    (C/D) Generate a really fast goto driven FSM by embedding action
136              lists in the state machine control code.
137
138       -P<N>  (C/D) N-Way Split really fast goto-driven FSM.
139
140

RAGEL INPUT

142       NOTE: This is a  very  brief  description  of  Ragel  input.  Ragel  is
143       described  in more detail in the user guide available from the homepage
144       (see below).
145
146       Ragel normally passes input files straight to the output. When it  sees
147       an  FSM  specification that contains machine instantiations it stops to
148       generate the state machine. If there  are  write  statements  (such  as
149       "write exec") then ragel emits the corresponding code. There can be any
150       number of FSM specifications in an input file. A multi-line FSM  speci‐
151       fication starts with '%%{' and ends with '}%%'. A single line FSM spec‐
152       ification starts with %% and ends at the first newline.
153

FSM STATEMENTS

155       Machine Name:
156              Set the the name of the machine. If given, it must be the  first
157              statement.
158
159       Alphabet Type:
160              Set the data type of the alphabet.
161
162       GetKey:
163              Specify  how to retrieve the alphabet character from the element
164              type.
165
166       Include:
167              Include a machine of same name as the current or of a  different
168              name in either the current file or some other file.
169
170       Action Definition:
171              Define an action that can be invoked by the FSM.
172
173       Fsm Definition, Instantiation and Longest Match Instantiation:
174              Used to build FSMs. Syntax description in next few sections.
175
176       Access:
177              Specify how to access the persistent state machine variables.
178
179       Write: Write some component of the machine.
180
181       Variable:
182              Override the default variable names (p, pe, cs, act, etc).
183

BASIC MACHINES

185       The  basic  machines  are  the  base  operands  of the regular language
186       expressions.
187
188       'hello'
189              Concat literal. Produces a concatenation of  the  characters  in
190              the  string.   Supports  escape  sequences with '\'.  The result
191              will have a start state and a transition to a new state for each
192              character  in the string. The last state in the sequence will be
193              made final. To make the string case-insensitive, append  an  'i'
194              to the string, as in 'cmd'i.
195
196       "hello"
197              Identical to single quote version.
198
199       [hello]
200              Or  literal. Produces a union of characters.  Supports character
201              ranges with '-', negating the sense of the union with an initial
202              '^'  and  escape  sequences  with  '\'. The result will have two
203              states with a transition between  them  for  each  character  or
204              range.
205
206       NOTE:  '',  "",  and [] produce null FSMs. Null machines have one state
207       that is both a start state and a final state and match the zero  length
208       string. A null machine may be created with the null builtin machine.
209
210       integer
211              Makes a two state machine with one transition on the given inte‐
212              ger number.
213
214       hex    Makes a two state machine with one transition on the given  hex‐
215              idecimal number.
216
217       /simple_regex/
218              A  simple regular expression. Supports the notation '.', '*' and
219              '[]', character ranges with '-', negating the  sense  of  an  OR
220              expression  with  and initial '^' and escape sequences with '\'.
221              Also supports one trailing flag: i. Use it to  produce  a  case-
222              insensitive regular expression, as in /GET/i.
223
224       lit .. lit
225              Specifies a range. The allowable upper and lower bounds are con‐
226              cat literals of length one and number  machines.   For  example,
227              0x10..0x20,  0..63, and 'a'..'z' are valid ranges.
228
229       variable_name
230              References  the machine definition assigned to the variable name
231              given.
232
233       builtin_machine
234              There are several builtin machines available. They are  all  two
235              state  machines  for  the  purpose of matching common classes of
236              characters. They are:
237
238              any    Any character in the alphabet.
239
240              ascii  Ascii characters 0..127.
241
242              extend Ascii extended characters. This is  the  range  -128..127
243                     for  signed  alphabets  and the range 0..255 for unsigned
244                     alphabets.
245
246              alpha  Alphabetic characters /[A-Za-z]/.
247
248              digit  Digits /[0-9]/.
249
250              alnum  Alpha numerics /[0-9A-Za-z]/.
251
252              lower  Lowercase characters /[a-z]/.
253
254              upper  Uppercase characters /[A-Z]/.
255
256              xdigit Hexidecimal digits /[0-9A-Fa-f]/.
257
258              cntrl  Control characters 0..31.
259
260              graph  Graphical characters /[!-~]/.
261
262              print  Printable characters /[ -~]/.
263
264              punct  Punctuation. Graphical characters  that  are  not  alpha-
265                     numerics /[!-/:-@\[-`{-~]/.
266
267              space  Whitespace /[\t\v\f\n\r ]/.
268
269              null   Zero length string. Equivalent to '', "" and [].
270
271              empty  Empty set. Matches nothing.
272

BRIEF OPERATOR REFERENCE

274       Operators are grouped by precedence, group 1 being the lowest and group
275       6 the highest.
276
277       GROUP 1:
278
279       expr , expr
280              Join machines together without drawing any transitions,  setting
281              up  a  start  state  or  any  final  states. Start state must be
282              explicitly specified with the "start" label. Final states may be
283              specified with the an epsilon transitions to the implicitly cre‐
284              ated "final" state.
285
286       GROUP 2:
287
288       expr | expr
289              Produces a machine that matches any string  in  machine  one  or
290              machine two.
291
292       expr & expr
293              Produces  a  machine  that  matches  any  string that is in both
294              machine one and machine two.
295
296       expr - expr
297              Produces a machine that matches any string that  is  in  machine
298              one but not in machine two.
299
300       expr -- expr
301              Strong  Subtraction. Matches any string in machine one that does
302              not have any string in machine two as a substring.
303
304       GROUP 3:
305
306       expr . expr
307              Produces a machine that matches all the strings in  machine  one
308              followed by all the strings in machine two.
309
310       expr :> expr
311              Entry-Guarded  Concatenation:  terminates machine one upon entry
312              to machine two.
313
314       expr :>> expr
315              Finish-Guarded  Concatenation:  terminates  machine   one   when
316              machine two finishes.
317
318       expr <: expr
319              Left-Guarded  Concatenation:  gives a higher priority to machine
320              one.
321
322       NOTE: Concatenation is the default operator. Two machines next to  each
323       other with no operator between them results in the concatenation opera‐
324       tion.
325
326       GROUP 4:
327
328       label: expr
329              Attaches a label to an expression. Labels can be used by epsilon
330              transitions and fgoto and fcall statements in actions. Also note
331              that the referencing of a machine definition causes the implicit
332              creation of label by the same name.
333
334       GROUP 5:
335
336       expr -> label
337              Draws an epsilon transition to the state defined by label. Label
338              must be a name in the current  scope.  Epsilon  transitions  are
339              resolved  when  comma operators are evaluated and at the root of
340              the expression tree of machine assignment/instantiation.
341
342       GROUP 6: Actions
343
344       An action may be a name predefined with an action statement or  may  be
345       specified directly with '{' and '}' in the expression.
346
347       expr > action
348              Embeds action into starting transitions.
349
350       expr @ action
351              Embeds action into transitions that go into a final state.
352
353       expr $ action
354              Embeds action into all transitions. Does not include pending out
355              transitions.
356
357       expr % action
358              Embeds action into pending out transitions from final states.
359
360       GROUP 6: EOF Actions
361
362       When a machine's finish routine  is  called  the  current  state's  EOF
363       actions are executed.
364
365       expr >/ action
366              Embed an EOF action into the start state.
367
368       expr </ action
369              Embed an EOF action into all states except the start state.
370
371       expr $/ action
372              Embed an EOF action into all states.
373
374       expr %/ action
375              Embed an EOF action into final states.
376
377       expr @/ action
378              Embed an EOF action into all states that are not final.
379
380       expr <>/ action
381              Embed an EOF action into all states that are not the start state
382              and that are not final (middle states).
383
384       GROUP 6: Global Error Actions
385
386       Global error actions are stored in states until the final state machine
387       has  been fully constructed. They are then transferred to error transi‐
388       tions, giving the effect of a default action.
389
390       expr >! action
391              Embed a global error action into the start state.
392
393       expr <! action
394              Embed a global error action into all  states  except  the  start
395              state.
396
397       expr $! action
398              Embed a global error action into all states.
399
400       expr %! action
401              Embed a global error action into the final states.
402
403       expr @! action
404              Embed a global error action into all states which are not final.
405
406       expr <>! action
407              Embed  a  global  error action into all states which are not the
408              start state and are not final (middle states).
409
410       GROUP 6: Local Error Actions
411
412       Local error actions are stored in states until  the  named  machine  is
413       fully constructed. They are then transferred to error transitions, giv‐
414       ing the effect of a default action for a section of the total  machine.
415       Note  that  the  name  may be omitted, in which case the action will be
416       transferred to error actions upon construction of the current machine.
417
418       expr >^ action
419              Embed a local error action into the start state.
420
421       expr <^ action
422              Embed a local error action into  all  states  except  the  start
423              state.
424
425       expr $^ action
426              Embed a local error action into all states.
427
428       expr %^ action
429              Embed a local error action into the final states.
430
431       expr @^ action
432              Embed a local error action into all states which are not final.
433
434       expr <>^ action
435              Embed  a  local  error  action into all states which are not the
436              start state and are not final (middle states).
437
438       GROUP 6: To-State Actions
439
440       To state actions are stored in states and executed any time the machine
441       moves into a state. This includes regular transitions, and transfers of
442       control such as fgoto. Note that setting the current state from outside
443       the  machine  (for  example  during initialization) does not count as a
444       transition into a state.
445
446       expr >~ action
447              Embed a to-state action action into the start state.
448
449       expr <~ action
450              Embed a to-state action into all states except the start state.
451
452       expr $~ action
453              Embed a to-state action into all states.
454
455       expr %~ action
456              Embed a to-state action into the final states.
457
458       expr @~ action
459              Embed a to-state action into all states which are not final.
460
461       expr <>~ action
462              Embed a to-state action into all states which are not the  start
463              state and are not final (middle states).
464
465       GROUP 6: From-State Actions
466
467       From  state actions are executed whenever a state takes a transition on
468       a character.  This includes the error transition and  a  transition  to
469       self.
470
471       expr >* action
472              Embed a from-state action into the start state.
473
474       expr <* action
475              Embed  a  from-state  action  into  every state except the start
476              state.
477
478       expr $* action
479              Embed a from-state action into all states.
480
481       expr %* action
482              Embed a from-state action into the final states.
483
484       expr @* action
485              Embed a from-state action into all states which are not final.
486
487       expr <>* action
488              Embed a from-state action into all  states  which  are  not  the
489              start state and are not final (middle states).
490
491       GROUP 6: Priority Assignment
492
493       Priorities are assigned to names within transitions. Only priorities on
494       the same name are allowed to interact. In the first form of  priorities
495       the name defaults to the name of the machine definition the priority is
496       assigned in.  Transitions do not have default priorities.
497
498       expr > int
499              Assigns the priority int in all transitions  leaving  the  start
500              state.
501
502       expr @ int
503              Assigns the priority int in all transitions that go into a final
504              state.
505
506       expr $ int
507              Assigns the priority int in all existing transitions.
508
509       expr % int
510              Assigns the priority int in all pending out transitions.
511
512       A second form of priority assignment allows the programmer  to  specify
513       the  name  to  which the priority is assigned, allowing interactions to
514       cross machine definition boundaries.
515
516       expr > (name,int)
517              Assigns the priority int to name in all transitions leaving  the
518              start state.
519
520       expr @ (name, int)
521              Assigns the priority int to name in all transitions that go into
522              a final state.
523
524       expr $ (name, int)
525              Assigns the priority int to name in all existing transitions.
526
527       expr % (name, int)
528              Assigns the priority int to name in all pending out transitions.
529
530       GROUP 7:
531
532       expr * Produces the kleene star of a machine. Matches zero or more rep‐
533              etitions of the machine.
534
535       expr **
536              Longest-Match  Kleene  Star.  This version of kleene star puts a
537              higher priority on staying in the machine over  wrapping  around
538              and  starting over. This operator is equivalent to ( ( expr ) $0
539              %1 )*.
540
541       expr ? Produces a machine that accepts the machine given  or  the  null
542              string. This operator is equivalent to  ( expr | '' ).
543
544       expr + Produces the machine concatenated with the kleen star of itself.
545              Matches one or more repetitions of the machine.   This  operator
546              is equivalent to ( expr . expr* ).
547
548       expr {n}
549              Produces a machine that matches exactly n repetitions of expr.
550
551       expr {,n}
552              Produces  a machine that matches anywhere from zero to n repeti‐
553              tions of expr.
554
555       expr {n,}
556              Produces a machine that matches n or more repetitions of expr.
557
558       expr {n,m}
559              Produces a machine that matches n to m repetitions of expr.
560
561       GROUP 8:
562
563       ! expr Produces a machine that matches any string not  matched  by  the
564              given  machine.  This operator is equivalent to ( *extend - expr
565              ).
566
567       ^ expr Character-Level  Negation.  Matches  any  single  character  not
568              matched by the single character machine expr.
569
570       GROUP 9:
571
572       ( expr )
573              Forces precedence on operators.
574

VALUES AVAILABLE IN CODE BLOCKS

576       fc     The current character. Equivalent to *p.
577
578       fpc    A pointer to the current character. Equivalent to p.
579
580       fcurs  An integer value representing the current state.
581
582       ftargs An integer value representing the target state.
583
584       fentry(<label>)
585              An integer value representing the entry point <label>.
586

STATEMENTS AVAILABLE IN CODE BLOCKS

588       fhold; Do not advance over the current character. Equivalent to --p;.
589
590       fexec <expr>;
591              Sets  the current character to something else. Equivalent to p =
592              (<expr>)-1;
593
594       fgoto <label>;
595              Jump to the machine defined by <label>.
596
597       fgoto *<expr>;
598              Jump to the entry point given by  <expr>.  The  expression  must
599              evaluate to an integer value representing a state.
600
601       fnext <label>;
602              Set  the  next  state  to be the entry point defined by <label>.
603              The fnext statement does not immediately jump to  the  specified
604              state. Any action code following the statement is executed.
605
606       fnext *<expr>;
607              Set  the  next  state to be the entry point given by <expr>. The
608              expression must evaluate to  an  integer  value  representing  a
609              state.
610
611       fcall <label>;
612              Call  the machine defined by <label>. The next fret will jump to
613              the target of the transition on which the action is invoked.
614
615       fcall *<expr>;
616              Call the entry point given by <expr>. The next fret will jump to
617              the target of the transition on which the action is invoked.
618
619       fret;  Return  to  the target state of the transition on which the last
620              fcall was made.
621
622       fbreak;
623              Save the current state and immediately break out of the machine.
624

CREDITS

626       Ragel was written by Adrian Thurston  <thurston@complang.org>.   Objec‐
627       tive-C  output contributed by Erich Ocean. D output contributed by Alan
628       West. Ruby output contributed by Victor Hugo Borja. C Sharp code gener‐
629       ation contributed by Daniel Tang. Contributions to Java code generation
630       by Colin Fleming.
631

SEE ALSO

633       re2c(1), flex(1)
634
635       Homepage: http://www.complang.org/ragel/
636
637
638
639Ragel 6.6                          Dec 2009                           RAGEL(1)
Impressum