1
2
3RAGEL(1)                 Ragel State Machine Compiler                 RAGEL(1)
4
5
6

NAME

8       ragel - compile regular languages into executable state machines
9

SYNOPSIS

11       ragel [options] file
12

DESCRIPTION

14       Ragel compiles executable finite state machines from regular languages.
15       Ragel can generate C, C++, Objective-C, D, or Java  code.  Ragel  state
16       machines  can  not  only recognize byte sequences as regular expression
17       machines do, but can also execute  code  at  arbitrary  points  in  the
18       recognition  of a regular language.  User code is embedded using inline
19       operators that do not disrupt the regular language syntax.
20
21       The core language consists of standard  regular  expression  operators,
22       such  as  union,  concatenation  and kleene star, accompanied by action
23       embedding operators. Ragel also provides operators that let you control
24       any non-determinism that you create, construct scanners using the long‐
25       est match paradigm, and  build  state  machines  using  the  statechart
26       model.  It  is  also  possible  to  influence  the execution of a state
27       machine from inside an embedded action by jumping or calling  to  other
28       parts of the machine and reprocessing input.
29
30       Ragel  provides  a  very  flexibile interface to the host language that
31       attempts to place minimal restrictions on how  the  generated  code  is
32       used  and  integrated  into  the application. The generated code has no
33       dependencies.
34
35

OPTIONS

37       -h, -H, -?, --help
38              Display help and exit.
39
40       -v     Print version information and exit.
41
42       -o  file
43              Write output to file. If -o is not given, a default file name is
44              chosen  by  replacing  the suffix of the input. For source files
45              ending in .rh the suffix .h is used. For all other source  files
46              a  suffix  based  on  the output language is used (.c, .cpp, .m,
47              .dot)
48
49       -s     Print some statistics on standard error.
50
51       --error-format=gnu
52              Print  error  messages  using  the  format   "file:line:column:"
53              (default)
54
55       --error-format=msvc
56              Print error messages using the format "file(line,column):"
57
58       -d     Do not remove duplicate actions from action lists.
59
60       -I  dir
61              Add  dir  to  the list of directories to search for included and
62              imported files
63
64       -n     Do not perform state minimization.
65
66       -m     Perform minimization once, at the end of the state machine  com‐
67              pilation.
68
69       -l     Minimize  after nearly every operation. Lists of like operations
70              such as unions are minimized  once  at  the  end.  This  is  the
71              default minimization option.
72
73       -e     Minimize after every operation.
74
75       -x     Run the frontend only: emit XML intermediate format.
76
77       -V     Generate a dot file for Graphviz.
78
79       -p     Display printable characters on labels.
80
81       -S <spec>
82              FSM specification to output
83
84       -M <machine>
85              Machine definition/instantiation to output
86
87       -C     The  host  language  is  C,  C++,  Obj-C or Obj-C++. This is the
88              default host language option.
89
90       -D     The host language is D.
91
92       -J     The host language is Java.
93
94       -R     The host language is Ruby.
95
96       -L     Inhibit writing of #line directives.
97
98       -T0    Table driven FSM (default).
99
100       -T1    Faster table driven FSM.
101
102       -F0    Flat table driven FSM.
103
104       -F1    Faster flat table-driven FSM.
105
106       -G0    Goto-driven FSM.
107
108       -G1    Faster goto-driven FSM.
109
110       -G2    Really fast goto-driven FSM.
111
112       -P<N>  N-Way Split really fast goto-driven FSM.
113
114

RAGEL INPUT

116       NOTE: This is a  very  brief  description  of  Ragel  input.  Ragel  is
117       described  in more detail in the user guide available from the homepage
118       (see below).
119
120       Ragel normally passes input files straight to the output. When it  sees
121       an  FSM  specification that contains machine instantiations it stops to
122       generate the state machine. If there  are  write  statements  (such  as
123       "write exec") then ragel emits the corresponding code. There can be any
124       number of FSM specifications in an input file. A multi-line FSM  speci‐
125       fication starts with '%%{' and ends with '}%%'. A single line FSM spec‐
126       ification starts with %% and ends at the first newline.
127

FSM STATEMENTS

129       Machine Name:
130              Set the the name of the machine. If given, it must be the  first
131              statement.
132
133       Alphabet Type:
134              Set the data type of the alphabet.
135
136       GetKey:
137              Specify  how to retrieve the alphabet character from the element
138              type.
139
140       Include:
141              Include a machine of same name as the current or of a  different
142              name in either the current file or some other file.
143
144       Action Definition:
145              Define an action that can be invoked by the FSM.
146
147       Fsm Definition, Instantiation and Longest Match Instantiation:
148              Used to build FSMs. Syntax description in next few sections.
149
150       Access:
151              Specify how to access the persistent state machine variables.
152
153       Write: Write some component of the machine.
154
155       Variable:
156              Override the default variable names (p, pe, cs, act, etc).
157

BASIC MACHINES

159       The  basic  machines  are  the  base  operands  of the regular language
160       expressions.
161
162       'hello'
163              Concat literal. Produces a concatenation of  the  characters  in
164              the  string.   Supports  escape  sequences with '\'.  The result
165              will have a start state and a transition to a new state for each
166              character  in the string. The last state in the sequence will be
167              made final. To make the string case-insensitive, append  an  'i'
168              to the string, as in 'cmd'i.
169
170       "hello"
171              Identical to single quote version.
172
173       [hello]
174              Or  literal. Produces a union of characters.  Supports character
175              ranges with '-', negating the sense of the union with an initial
176              '^'  and  escape  sequences  with  '\'. The result will have two
177              states with a transition between  them  for  each  character  or
178              range.
179
180       NOTE:  '',  "",  and [] produce null FSMs. Null machines have one state
181       that is both a start state and a final state and match the zero  length
182       string. A null machine may be created with the null builtin machine.
183
184       integer
185              Makes a two state machine with one transition on the given inte‐
186              ger number.
187
188       hex    Makes a two state machine with one transition on the given  hex‐
189              idecimal number.
190
191       /simple_regex/
192              A  simple regular expression. Supports the notation '.', '*' and
193              '[]', character ranges with '-', negating the  sense  of  an  OR
194              expression  with  and initial '^' and escape sequences with '\'.
195              Also supports one trailing flag: i. Use it to  produce  a  case-
196              insensitive regular expression, as in /GET/i.
197
198       lit .. lit
199              Specifies a range. The allowable upper and lower bounds are con‐
200              cat literals of length one and number  machines.   For  example,
201              0x10..0x20,  0..63, and 'a'..'z' are valid ranges.
202
203       variable_name
204              References  the machine definition assigned to the variable name
205              given.
206
207       builtin_machine
208              There are several builtin machines available. They are  all  two
209              state  machines  for  the  purpose of matching common classes of
210              characters. They are:
211
212              any    Any character in the alphabet.
213
214              ascii  Ascii characters 0..127.
215
216              extend Ascii extended characters. This is  the  range  -128..127
217                     for  signed  alphabets  and the range 0..255 for unsigned
218                     alphabets.
219
220              alpha  Alphabetic characters /[A-Za-z]/.
221
222              digit  Digits /[0-9]/.
223
224              alnum  Alpha numerics /[0-9A-Za-z]/.
225
226              lower  Lowercase characters /[a-z]/.
227
228              upper  Uppercase characters /[A-Z]/.
229
230              xdigit Hexidecimal digits /[0-9A-Fa-f]/.
231
232              cntrl  Control characters 0..31.
233
234              graph  Graphical characters /[!-~]/.
235
236              print  Printable characters /[ -~]/.
237
238              punct  Punctuation. Graphical characters  that  are  not  alpha-
239                     numerics /[!-/:-@\[-`{-~]/.
240
241              space  Whitespace /[\t\v\f\n\r ]/.
242
243              null   Zero length string. Equivalent to '', "" and [].
244
245              empty  Empty set. Matches nothing.
246

BRIEF OPERATOR REFERENCE

248       Operators are grouped by precedence, group 1 being the lowest and group
249       6 the highest.
250
251       GROUP 1:
252
253       expr , expr
254              Join machines together without drawing any transitions,  setting
255              up  a  start  state  or  any  final  states. Start state must be
256              explicitly specified with the "start" label. Final states may be
257              specified with the an epsilon transitions to the implicitly cre‐
258              ated "final" state.
259
260       GROUP 2:
261
262       expr | expr
263              Produces a machine that matches any string  in  machine  one  or
264              machine two.
265
266       expr & expr
267              Produces  a  machine  that  matches  any  string that is in both
268              machine one and machine two.
269
270       expr - expr
271              Produces a machine that matches any string that  is  in  machine
272              one but not in machine two.
273
274       expr -- expr
275              Strong  Subtraction. Matches any string in machine one that does
276              not have any string in machine two as a substring.
277
278       GROUP 3:
279
280       expr . expr
281              Produces a machine that matches all the strings in  machine  one
282              followed by all the strings in machine two.
283
284       expr :> expr
285              Entry-Guarded  Concatenation:  terminates machine one upon entry
286              to machine two.
287
288       expr :>> expr
289              Finish-Guarded  Concatenation:  terminates  machine   one   when
290              machine two finishes.
291
292       expr <: expr
293              Left-Guarded  Concatenation:  gives a higher priority to machine
294              one.
295
296       NOTE: Concatenation is the default operator. Two machines next to  each
297       other with no operator between them results in the concatenation opera‐
298       tion.
299
300       GROUP 4:
301
302       label: expr
303              Attaches a label to an expression. Labels can be used by epsilon
304              transitions and fgoto and fcall statements in actions. Also note
305              that the referencing of a machine definition causes the implicit
306              creation of label by the same name.
307
308       GROUP 5:
309
310       expr -> label
311              Draws an epsilon transition to the state defined by label. Label
312              must be a name in the current  scope.  Epsilon  transitions  are
313              resolved  when  comma operators are evaluated and at the root of
314              the expression tree of machine assignment/instantiation.
315
316       GROUP 6: Actions
317
318       An action may be a name predefined with an action statement or  may  be
319       specified directly with '{' and '}' in the expression.
320
321       expr > action
322              Embeds action into starting transitions.
323
324       expr @ action
325              Embeds action into transitions that go into a final state.
326
327       expr $ action
328              Embeds action into all transitions. Does not include pending out
329              transitions.
330
331       expr % action
332              Embeds action into pending out transitions from final states.
333
334       GROUP 6: EOF Actions
335
336       When a machine's finish routine  is  called  the  current  state's  EOF
337       actions are executed.
338
339       expr >/ action
340              Embed an EOF action into the start state.
341
342       expr </ action
343              Embed an EOF action into all states except the start state.
344
345       expr $/ action
346              Embed an EOF action into all states.
347
348       expr %/ action
349              Embed an EOF action into final states.
350
351       expr @/ action
352              Embed an EOF action into all states that are not final.
353
354       expr <>/ action
355              Embed an EOF action into all states that are not the start state
356              and that are not final (middle states).
357
358       GROUP 6: Global Error Actions
359
360       Global error actions are stored in states until the final state machine
361       has  been fully constructed. They are then transferred to error transi‐
362       tions, giving the effect of a default action.
363
364       expr >! action
365              Embed a global error action into the start state.
366
367       expr <! action
368              Embed a global error action into all  states  except  the  start
369              state.
370
371       expr $! action
372              Embed a global error action into all states.
373
374       expr %! action
375              Embed a global error action into the final states.
376
377       expr @! action
378              Embed a global error action into all states which are not final.
379
380       expr <>! action
381              Embed  a  global  error action into all states which are not the
382              start state and are not final (middle states).
383
384       GROUP 6: Local Error Actions
385
386       Local error actions are stored in states until  the  named  machine  is
387       fully constructed. They are then transferred to error transitions, giv‐
388       ing the effect of a default action for a section of the total  machine.
389       Note  that  the  name  may be omitted, in which case the action will be
390       transferred to error actions upon construction of the current machine.
391
392       expr >^ action
393              Embed a local error action into the start state.
394
395       expr <^ action
396              Embed a local error action into  all  states  except  the  start
397              state.
398
399       expr $^ action
400              Embed a local error action into all states.
401
402       expr %^ action
403              Embed a local error action into the final states.
404
405       expr @^ action
406              Embed a local error action into all states which are not final.
407
408       expr <>^ action
409              Embed  a  local  error  action into all states which are not the
410              start state and are not final (middle states).
411
412       GROUP 6: To-State Actions
413
414       To state actions are stored in states and executed any time the machine
415       moves into a state. This includes regular transitions, and transfers of
416       control such as fgoto. Note that setting the current state from outside
417       the  machine  (for  example  during initialization) does not count as a
418       transition into a state.
419
420       expr >~ action
421              Embed a to-state action action into the start state.
422
423       expr <~ action
424              Embed a to-state action into all states except the start state.
425
426       expr $~ action
427              Embed a to-state action into all states.
428
429       expr %~ action
430              Embed a to-state action into the final states.
431
432       expr @~ action
433              Embed a to-state action into all states which are not final.
434
435       expr <>~ action
436              Embed a to-state action into all states which are not the  start
437              state and are not final (middle states).
438
439       GROUP 6: From-State Actions
440
441       From  state actions are executed whenever a state takes a transition on
442       a character.  This includes the error transition and  a  transition  to
443       self.
444
445       expr >* action
446              Embed a from-state action into the start state.
447
448       expr <* action
449              Embed  a  from-state  action  into  every state except the start
450              state.
451
452       expr $* action
453              Embed a from-state action into all states.
454
455       expr %* action
456              Embed a from-state action into the final states.
457
458       expr @* action
459              Embed a from-state action into all states which are not final.
460
461       expr <>* action
462              Embed a from-state action into all  states  which  are  not  the
463              start state and are not final (middle states).
464
465       GROUP 6: Priority Assignment
466
467       Priorities are assigned to names within transitions. Only priorities on
468       the same name are allowed to interact. In the first form of  priorities
469       the name defaults to the name of the machine definition the priority is
470       assigned in.  Transitions do not have default priorities.
471
472       expr > int
473              Assigns the priority int in all transitions  leaving  the  start
474              state.
475
476       expr @ int
477              Assigns the priority int in all transitions that go into a final
478              state.
479
480       expr $ int
481              Assigns the priority int in all existing transitions.
482
483       expr % int
484              Assigns the priority int in all pending out transitions.
485
486       A second form of priority assignment allows the programmer  to  specify
487       the  name  to  which the priority is assigned, allowing interactions to
488       cross machine definition boundaries.
489
490       expr > (name,int)
491              Assigns the priority int to name in all transitions leaving  the
492              start state.
493
494       expr @ (name, int)
495              Assigns the priority int to name in all transitions that go into
496              a final state.
497
498       expr $ (name, int)
499              Assigns the priority int to name in all existing transitions.
500
501       expr % (name, int)
502              Assigns the priority int to name in all pending out transitions.
503
504       GROUP 7:
505
506       expr * Produces the kleene star of a machine. Matches zero or more rep‐
507              etitions of the machine.
508
509       expr **
510              Longest-Match  Kleene  Star.  This version of kleene star puts a
511              higher priority on staying in the machine over  wrapping  around
512              and  starting over. This operator is equivalent to ( ( expr ) $0
513              %1 )*.
514
515       expr ? Produces a machine that accepts the machine given  or  the  null
516              string. This operator is equivalent to  ( expr | '' ).
517
518       expr + Produces the machine concatenated with the kleen star of itself.
519              Matches one or more repetitions of the machine.   This  operator
520              is equivalent to ( expr . expr* ).
521
522       expr {n}
523              Produces a machine that matches exactly n repetitions of expr.
524
525       expr {,n}
526              Produces  a machine that matches anywhere from zero to n repeti‐
527              tions of expr.
528
529       expr {n,}
530              Produces a machine that matches n or more repetitions of expr.
531
532       expr {n,m}
533              Produces a machine that matches n to m repetitions of expr.
534
535       GROUP 8:
536
537       ! expr Produces a machine that matches any string not  matched  by  the
538              given  machine.  This operator is equivalent to ( *extend - expr
539              ).
540
541       ^ expr Character-Level  Negation.  Matches  any  single  character  not
542              matched by the single character machine expr.
543
544       GROUP 9:
545
546       ( expr )
547              Forces precedence on operators.
548

VALUES AVAILABLE IN CODE BLOCKS

550       fc     The current character. Equivalent to *p.
551
552       fpc    A pointer to the current character. Equivalent to p.
553
554       fcurs  An integer value representing the current state.
555
556       ftargs An integer value representing the target state.
557
558       fentry(<label>)
559              An integer value representing the entry point <label>.
560

STATEMENTS AVAILABLE IN CODE BLOCKS

562       fhold; Do not advance over the current character. Equivalent to --p;.
563
564       fexec <expr>;
565              Sets  the current character to something else. Equivalent to p =
566              (<expr>)-1;
567
568       fgoto <label>;
569              Jump to the machine defined by <label>.
570
571       fgoto *<expr>;
572              Jump to the entry point given by  <expr>.  The  expression  must
573              evaluate to an integer value representing a state.
574
575       fnext <label>;
576              Set  the  next  state  to be the entry point defined by <label>.
577              The fnext statement does not immediately jump to  the  specified
578              state. Any action code following the statement is executed.
579
580       fnext *<expr>;
581              Set  the  next  state to be the entry point given by <expr>. The
582              expression must evaluate to  an  integer  value  representing  a
583              state.
584
585       fcall <label>;
586              Call  the machine defined by <label>. The next fret will jump to
587              the target of the transition on which the action is invoked.
588
589       fcall *<expr>;
590              Call the entry point given by <expr>. The next fret will jump to
591              the target of the transition on which the action is invoked.
592
593       fret;  Return  to  the target state of the transition on which the last
594              fcall was made.
595
596       fbreak;
597              Save the current state and immediately break out of the machine.
598

BUGS

600       Ragel is still under development and has not  yet  matured.  There  are
601       probably many bugs.
602

CREDITS

604       Ragel  was written by Adrian Thurston <thurston@cs.queensu.ca>.  Objec‐
605       tive-C output contributed by Erich Ocean. D output contributed by  Alan
606       West.
607

SEE ALSO

609       rlgen-cd(1), rlgen-dot(1), rlgen-java(1), re2c(1), flex(1)
610
611       Homepage: http://www.cs.queensu.ca/~thurston/ragel/
612
613
614
615Ragel 6.2                         March 2008                          RAGEL(1)
Impressum