1
2
3
4RAGEL(1)                 Ragel State Machine Compiler                 RAGEL(1)
5
6
7

NAME

9       ragel - compile regular languages into executable state machines
10

SYNOPSIS

12       ragel [options] file
13

DESCRIPTION

15       Ragel compiles executable finite state machines from regular languages.
16       Ragel can generate C, C++, Objective-C, D,  Go,  or  Java  code.  Ragel
17       state machines can not only recognize byte sequences as regular expres‐
18       sion machines do, but can also execute code at arbitrary points in  the
19       recognition  of a regular language.  User code is embedded using inline
20       operators that do not disrupt the regular language syntax.
21
22       The core language consists of standard  regular  expression  operators,
23       such as union, concatenation and kleene star, accompanied by action em‐
24       bedding operators. Ragel also provides operators that let  you  control
25       any non-determinism that you create, construct scanners using the long‐
26       est match paradigm, and  build  state  machines  using  the  statechart
27       model.  It  is  also possible to influence the execution of a state ma‐
28       chine from inside an embedded action by jumping  or  calling  to  other
29       parts of the machine and reprocessing input.
30
31       Ragel provides a very flexibile interface to the host language that at‐
32       tempts to place minimal restrictions on how the generated code is  used
33       and  integrated  into the application. The generated code has no depen‐
34       dencies.
35
36

OPTIONS

38       -h, -H, -?, --help
39              Display help and exit.
40
41       -v     Print version information and exit.
42
43       -o  file
44              Write output to file. If -o is not given, a default file name is
45              chosen  by  replacing the file extenstion of the input file. For
46              source files ending in .rh the suffix .h is used. For all  other
47              source  files a suffix based on the output language is used (.c,
48              .cpp, .m, etc.). If -o is not given for Graphviz output the gen‐
49              erated dot file is written to standard output.
50
51       -s     Print some statistics on standard error.
52
53       --error-format=gnu
54              Print  error  messages using the format "file:line:column:" (de‐
55              fault)
56
57       --error-format=msvc
58              Print error messages using the format "file(line,column):"
59
60       -d     Do not remove duplicate actions from action lists.
61
62       -I  dir
63              Add dir to the list of directories to search  for  included  and
64              imported files
65
66       -n     Do not perform state minimization.
67
68       -m     Perform  minimization once, at the end of the state machine com‐
69              pilation.
70
71       -l     Minimize after nearly every operation. Lists of like  operations
72              such  as  unions  are minimized once at the end. This is the de‐
73              fault minimization option.
74
75       -e     Minimize after every operation.
76
77       -x     Compile the state machines and emit an XML representation of the
78              host data and the machines.
79
80       -V     Generate a dot file for Graphviz.
81
82       -p     Display printable characters on labels.
83
84       -L     Inhibit writing of #line directives.
85
86       -S <spec>
87              FSM specification to output.
88
89       -M <machine>
90              Machine definition/instantiation to output.
91
92       -C     The  host  language is C, C++, Obj-C or Obj-C++. This is the de‐
93              fault host language option.
94
95       --asm --gas-x86-64-sys-v
96              GNU AS, x86_64, System V ABI ASM is generated in  a  code  style
97              equiv to -G2
98
99       -D     The host language is D.
100
101       -Z     The host language is Go.
102
103       -J     The host language is Java.
104
105       -R     The host language is Ruby.
106
107       -A     The host language is C#
108
109       -O     The host language is OCaml.
110
111       -U     The host language is Rust.
112
113       -Y     The host language is Julia.
114
115       -K     The host language is Crack.
116
117       -P     The host language is JavaScript.
118
119       -T0    (C/D/Java/Ruby/C#)  Generate a table driven FSM. This is the de‐
120              fault code style.  The table driven FSM represents the state ma‐
121              chine  as  static data. There are tables of states, transitions,
122              indices and actions. The current state is stored in a  variable.
123              The  execution is a loop that looks that given the current state
124              and current character to process looks up the transition to take
125              using  a  binary  search,  executes any actions and moves to the
126              target state. In  general,  the  table  driven  FSM  produces  a
127              smaller  binary and requires a less expensive host language com‐
128              pile but results in slower running code. The table driven FSM is
129              suitable for any FSM.
130
131       -T1    (C/D/Ruby/C#)  Generate  a  faster table driven FSM by expanding
132              action lists in the action execute code.
133
134       -F0    (C/D/Ruby/C#) Generate a flat table driven FSM. Transitions  are
135              represented  as an array indexed by the current alphabet charac‐
136              ter. This eliminates the need for  a  binary  search  to  locate
137              transitions  and  produces faster code, however it is only suit‐
138              able for small alphabets.
139
140       -F1    (C/D/Ruby/C#) Generate a faster flat table driven FSM by expand‐
141              ing action lists in the action execute code.
142
143       -G0    (C/D/C#)  Generate a goto driven FSM. The goto driven FSM repre‐
144              sents the state machine as a series of goto statements. While in
145              the  machine, the current state is stored by the processor's in‐
146              struction pointer. The execution is a flat function  where  con‐
147              trol  is passed from state to state using gotos. In general, the
148              goto FSM produces faster code but results in a larger binary and
149              a more expensive host language compile.
150
151       -G1    (C/D/C#)  Generate  a faster goto driven FSM by expanding action
152              lists in the action execute code.
153
154       -G2    (C/D/Go) Generate a really fast goto driven FSM by embedding ac‐
155              tion lists in the state machine control code.
156
157       --nfa-conds-depth=D
158              Search  for  high-cost conditions inside a prefix of the machine
159              (depth D from start state). Search is rooted at NFA  union  con‐
160              tructs.
161
162       --nfa-term-check
163              Search  for  condition-based  general  repetitions that will not
164              function properly and must be NFA reps. Search is rooted at  NFA
165              union constructs.
166
167       --nfa-intermed-state-limit=L
168              Report fail if number of states exceeds this during compilation.
169
170       --nfa-final-state-limit=L
171              Report a fail if number states in final machine exceeds this.
172
173       --nfa-breadth-check=E1,E2,..
174              Report  breadth  cost  of named entry points by (and start). Re‐
175              porting starts at NFA union contructs.
176
177       --input-histogram=FN
178              Input char histogram for breadth check. If  unspecified  a  flat
179              histogram is used.
180

RAGEL INPUT

182       NOTE:  This  is  a  very brief description of Ragel input. Ragel is de‐
183       scribed in more detail in the user guide available  from  the  homepage
184       (see below).
185
186       Ragel  normally passes input files straight to the output. When it sees
187       an FSM specification that contains machine instantiations it  stops  to
188       generate  the  state  machine.  If  there are write statements (such as
189       "write exec") then ragel emits the corresponding code. There can be any
190       number  of FSM specifications in an input file. A multi-line FSM speci‐
191       fication starts with '%%{' and ends with '}%%'. A single line FSM spec‐
192       ification starts with %% and ends at the first newline.
193

FSM STATEMENTS

195       Machine Name:
196              Set  the the name of the machine. If given, it must be the first
197              statement.
198
199       Alphabet Type:
200              Set the data type of the alphabet.
201
202       GetKey:
203              Specify how to retrieve the alphabet character from the  element
204              type.
205
206       Include:
207              Include  a machine of same name as the current or of a different
208              name in either the current file or some other file.
209
210       Action Definition:
211              Define an action that can be invoked by the FSM.
212
213       Fsm Definition, Instantiation and Longest Match Instantiation:
214              Used to build FSMs. Syntax description in next few sections.
215
216       Access:
217              Specify how to access the persistent state machine variables.
218
219       Write: Write some component of the machine.
220
221       Variable:
222              Override the default variable names (p, pe, cs, act, etc).
223

BASIC MACHINES

225       The basic machines are the base operands of the  regular  language  ex‐
226       pressions.
227
228       'hello'
229              Concat  literal.  Produces  a concatenation of the characters in
230              the string.  Supports escape sequences  with  '\'.   The  result
231              will have a start state and a transition to a new state for each
232              character in the string. The last state in the sequence will  be
233              made  final.  To make the string case-insensitive, append an 'i'
234              to the string, as in 'cmd'i.
235
236       "hello"
237              Identical to single quote version.
238
239       [hello]
240              Or literal. Produces a union of characters.  Supports  character
241              ranges with '-', negating the sense of the union with an initial
242              '^' and escape sequences with '\'.  The  result  will  have  two
243              states  with  a  transition  between  them for each character or
244              range.
245
246       NOTE: '', "", and [] produce null FSMs. Null machines  have  one  state
247       that  is both a start state and a final state and match the zero length
248       string. A null machine may be created with the null builtin machine.
249
250       integer
251              Makes a two state machine with one transition on the given inte‐
252              ger number.
253
254       hex    Makes  a two state machine with one transition on the given hex‐
255              idecimal number.
256
257       /simple_regex/
258              A simple regular expression. Supports the notation '.', '*'  and
259              '[]', character ranges with '-', negating the sense of an OR ex‐
260              pression with and initial '^' and  escape  sequences  with  '\'.
261              Also supports one trailing flag: i. Use it to produce a case-in‐
262              sensitive regular expression, as in /GET/i.
263
264       lit .. lit
265              Specifies a range. The allowable upper and lower bounds are con‐
266              cat  literals  of  length one and number machines.  For example,
267              0x10..0x20,  0..63, and 'a'..'z' are valid ranges.
268
269       variable_name
270              References the machine definition assigned to the variable  name
271              given.
272
273       builtin_machine
274              There  are  several builtin machines available. They are all two
275              state machines for the purpose of  matching  common  classes  of
276              characters. They are:
277
278              any    Any character in the alphabet.
279
280              ascii  Ascii characters 0..127.
281
282              extend Ascii  extended  characters.  This is the range -128..127
283                     for signed alphabets and the range  0..255  for  unsigned
284                     alphabets.
285
286              alpha  Alphabetic characters /[A-Za-z]/.
287
288              digit  Digits /[0-9]/.
289
290              alnum  Alpha numerics /[0-9A-Za-z]/.
291
292              lower  Lowercase characters /[a-z]/.
293
294              upper  Uppercase characters /[A-Z]/.
295
296              xdigit Hexidecimal digits /[0-9A-Fa-f]/.
297
298              cntrl  Control characters 0..31, 127.
299
300              graph  Graphical characters /[!-~]/.
301
302              print  Printable characters /[ -~]/.
303
304              punct  Punctuation.  Graphical characters that are not alpha-nu‐
305                     merics /[!-/:-@\[-`{-~]/.
306
307              space  Whitespace /[\t\v\f\n\r ]/.
308
309              null   Zero length string. Equivalent to '', "" and [].
310
311              empty  Empty set. Matches nothing.
312

BRIEF OPERATOR REFERENCE

314       Operators are grouped by precedence, group 1 being the lowest and group
315       6 the highest.
316
317       GROUP 1:
318
319       expr , expr
320              Join  machines together without drawing any transitions, setting
321              up a start state or any final states. Start state  must  be  ex‐
322              plicitly  specified  with the "start" label. Final states may be
323              specified with the an epsilon transitions to the implicitly cre‐
324              ated "final" state.
325
326       GROUP 2:
327
328       expr | expr
329              Produces a machine that matches any string in machine one or ma‐
330              chine two.
331
332       expr & expr
333              Produces a machine that matches any string that is in  both  ma‐
334              chine one and machine two.
335
336       expr - expr
337              Produces  a  machine  that matches any string that is in machine
338              one but not in machine two.
339
340       expr -- expr
341              Strong Subtraction. Matches any string in machine one that  does
342              not have any string in machine two as a substring.
343
344       GROUP 3:
345
346       expr . expr
347              Produces  a  machine that matches all the strings in machine one
348              followed by all the strings in machine two.
349
350       expr :> expr
351              Entry-Guarded Concatenation: terminates machine one  upon  entry
352              to machine two.
353
354       expr :>> expr
355              Finish-Guarded  Concatenation:  terminates  machine one when ma‐
356              chine two finishes.
357
358       expr <: expr
359              Left-Guarded Concatenation: gives a higher priority  to  machine
360              one.
361
362       NOTE:  Concatenation is the default operator. Two machines next to each
363       other with no operator between them results in the concatenation opera‐
364       tion.
365
366       GROUP 4:
367
368       label: expr
369              Attaches a label to an expression. Labels can be used by epsilon
370              transitions and fgoto and fcall statements in actions. Also note
371              that the referencing of a machine definition causes the implicit
372              creation of label by the same name.
373
374       GROUP 5:
375
376       expr -> label
377              Draws an epsilon transition to the state defined by label. Label
378              must be a name in the current scope. Epsilon transitions are re‐
379              solved when comma operators are evaluated and at the root of the
380              expression tree of machine assignment/instantiation.
381
382       GROUP 6: Actions
383
384       An  action  may be a name predefined with an action statement or may be
385       specified directly with '{' and '}' in the expression.
386
387       expr > action
388              Embeds action into starting transitions.
389
390       expr @ action
391              Embeds action into transitions that go into a final state.
392
393       expr $ action
394              Embeds action into all transitions. Does not include pending out
395              transitions.
396
397       expr % action
398              Embeds action into pending out transitions from final states.
399
400       GROUP 6: EOF Actions
401
402       When  a  machine's finish routine is called the current state's EOF ac‐
403       tions are executed.
404
405       expr >/ action
406              Embed an EOF action into the start state.
407
408       expr </ action
409              Embed an EOF action into all states except the start state.
410
411       expr $/ action
412              Embed an EOF action into all states.
413
414       expr %/ action
415              Embed an EOF action into final states.
416
417       expr @/ action
418              Embed an EOF action into all states that are not final.
419
420       expr <>/ action
421              Embed an EOF action into all states that are not the start state
422              and that are not final (middle states).
423
424       GROUP 6: Global Error Actions
425
426       Global error actions are stored in states until the final state machine
427       has been fully constructed. They are then transferred to error  transi‐
428       tions, giving the effect of a default action.
429
430       expr >! action
431              Embed a global error action into the start state.
432
433       expr <! action
434              Embed  a  global  error  action into all states except the start
435              state.
436
437       expr $! action
438              Embed a global error action into all states.
439
440       expr %! action
441              Embed a global error action into the final states.
442
443       expr @! action
444              Embed a global error action into all states which are not final.
445
446       expr <>! action
447              Embed a global error action into all states which  are  not  the
448              start state and are not final (middle states).
449
450       GROUP 6: Local Error Actions
451
452       Local  error  actions  are  stored in states until the named machine is
453       fully constructed. They are then transferred to error transitions, giv‐
454       ing  the effect of a default action for a section of the total machine.
455       Note that the name may be omitted, in which case  the  action  will  be
456       transferred to error actions upon construction of the current machine.
457
458       expr >^ action
459              Embed a local error action into the start state.
460
461       expr <^ action
462              Embed  a  local  error  action  into all states except the start
463              state.
464
465       expr $^ action
466              Embed a local error action into all states.
467
468       expr %^ action
469              Embed a local error action into the final states.
470
471       expr @^ action
472              Embed a local error action into all states which are not final.
473
474       expr <>^ action
475              Embed a local error action into all states  which  are  not  the
476              start state and are not final (middle states).
477
478       GROUP 6: To-State Actions
479
480       To state actions are stored in states and executed any time the machine
481       moves into a state. This includes regular transitions, and transfers of
482       control such as fgoto. Note that setting the current state from outside
483       the machine (for example during initialization) does  not  count  as  a
484       transition into a state.
485
486       expr >~ action
487              Embed a to-state action action into the start state.
488
489       expr <~ action
490              Embed a to-state action into all states except the start state.
491
492       expr $~ action
493              Embed a to-state action into all states.
494
495       expr %~ action
496              Embed a to-state action into the final states.
497
498       expr @~ action
499              Embed a to-state action into all states which are not final.
500
501       expr <>~ action
502              Embed  a to-state action into all states which are not the start
503              state and are not final (middle states).
504
505       GROUP 6: From-State Actions
506
507       From state actions are executed whenever a state takes a transition  on
508       a  character.   This  includes the error transition and a transition to
509       self.
510
511       expr >* action
512              Embed a from-state action into the start state.
513
514       expr <* action
515              Embed a from-state action into  every  state  except  the  start
516              state.
517
518       expr $* action
519              Embed a from-state action into all states.
520
521       expr %* action
522              Embed a from-state action into the final states.
523
524       expr @* action
525              Embed a from-state action into all states which are not final.
526
527       expr <>* action
528              Embed  a  from-state  action  into  all states which are not the
529              start state and are not final (middle states).
530
531       GROUP 6: Priority Assignment
532
533       Priorities are assigned to names within transitions. Only priorities on
534       the  same name are allowed to interact. In the first form of priorities
535       the name defaults to the name of the machine definition the priority is
536       assigned in.  Transitions do not have default priorities.
537
538       expr > int
539              Assigns  the  priority  int in all transitions leaving the start
540              state.
541
542       expr @ int
543              Assigns the priority int in all transitions that go into a final
544              state.
545
546       expr $ int
547              Assigns the priority int in all existing transitions.
548
549       expr % int
550              Assigns the priority int in all pending out transitions.
551
552       A  second  form of priority assignment allows the programmer to specify
553       the name to which the priority is assigned,  allowing  interactions  to
554       cross machine definition boundaries.
555
556       expr > (name,int)
557              Assigns  the priority int to name in all transitions leaving the
558              start state.
559
560       expr @ (name, int)
561              Assigns the priority int to name in all transitions that go into
562              a final state.
563
564       expr $ (name, int)
565              Assigns the priority int to name in all existing transitions.
566
567       expr % (name, int)
568              Assigns the priority int to name in all pending out transitions.
569
570       GROUP 7:
571
572       expr * Produces the kleene star of a machine. Matches zero or more rep‐
573              etitions of the machine.
574
575       expr **
576              Longest-Match Kleene Star. This version of kleene  star  puts  a
577              higher  priority  on staying in the machine over wrapping around
578              and starting over. This operator is equivalent to ( ( expr )  $0
579              %1 )*.
580
581       expr ? Produces  a  machine  that accepts the machine given or the null
582              string. This operator is equivalent to  ( expr | '' ).
583
584       expr + Produces the machine concatenated with the kleen star of itself.
585              Matches  one  or more repetitions of the machine.  This operator
586              is equivalent to ( expr . expr* ).
587
588       expr {n}
589              Produces a machine that matches exactly n repetitions of expr.
590
591       expr {,n}
592              Produces a machine that matches anywhere from zero to n  repeti‐
593              tions of expr.
594
595       expr {n,}
596              Produces a machine that matches n or more repetitions of expr.
597
598       expr {n,m}
599              Produces a machine that matches n to m repetitions of expr.
600
601       GROUP 8:
602
603       ! expr Produces  a  machine  that matches any string not matched by the
604              given machine.  This operator is equivalent to ( *extend -  expr
605              ).
606
607       ^ expr Character-Level  Negation.  Matches  any  single  character  not
608              matched by the single character machine expr.
609
610       GROUP 9:
611
612       ( expr )
613              Forces precedence on operators.
614

VALUES AVAILABLE IN CODE BLOCKS

616       fc     The current character. Equivalent to *p.
617
618       fpc    A pointer to the current character. Equivalent to p.
619
620       fcurs  An integer value representing the current state.
621
622       ftargs An integer value representing the target state.
623
624       fentry(<label>)
625              An integer value representing the entry point <label>.
626

STATEMENTS AVAILABLE IN CODE BLOCKS

628       fhold; Do not advance over the current character. Equivalent to --p;.
629
630       fexec <expr>;
631              Sets the current character to something else. Equivalent to p  =
632              (<expr>)-1;
633
634       fgoto <label>;
635              Jump to the machine defined by <label>.
636
637       fgoto *<expr>;
638              Jump  to  the  entry  point given by <expr>. The expression must
639              evaluate to an integer value representing a state.
640
641       fnext <label>;
642              Set the next state to be the entry  point  defined  by  <label>.
643              The  fnext  statement does not immediately jump to the specified
644              state. Any action code following the statement is executed.
645
646       fnext *<expr>;
647              Set the next state to be the entry point given  by  <expr>.  The
648              expression  must  evaluate  to  an  integer value representing a
649              state.
650
651       fcall <label>;
652              Call the machine defined by <label>. The next fret will jump  to
653              the target of the transition on which the action is invoked.
654
655       fcall *<expr>;
656              Call the entry point given by <expr>. The next fret will jump to
657              the target of the transition on which the action is invoked.
658
659       fret;  Return to the target state of the transition on which  the  last
660              fcall was made.
661
662       fbreak;
663              Save the current state and immediately break out of the machine.
664

CREDITS

666       Ragel  was  written  by Adrian Thurston <thurston@colm.net>. There have
667       been many contributors. Please see CREDITS file in source distribution.
668

SEE ALSO

670       re2c(1), flex(1)
671
672       Homepage: http://www.colm.net/open-source/ragel/
673
674
675
676Ragel @VERSION@                    @PUBDATE@                          RAGEL(1)
Impressum