1
2
3
4RAGEL(1) Ragel State Machine Compiler RAGEL(1)
5
6
7
9 ragel - compile regular languages into executable state machines
10
12 ragel [options] file
13
15 Ragel compiles executable finite state machines from regular languages.
16 Ragel can generate C, C++, Objective-C, D, Go, or Java code. Ragel
17 state machines can not only recognize byte sequences as regular expres‐
18 sion machines do, but can also execute code at arbitrary points in the
19 recognition of a regular language. User code is embedded using inline
20 operators that do not disrupt the regular language syntax.
21
22 The core language consists of standard regular expression operators,
23 such as union, concatenation and kleene star, accompanied by action em‐
24 bedding operators. Ragel also provides operators that let you control
25 any non-determinism that you create, construct scanners using the long‐
26 est match paradigm, and build state machines using the statechart
27 model. It is also possible to influence the execution of a state ma‐
28 chine from inside an embedded action by jumping or calling to other
29 parts of the machine and reprocessing input.
30
31 Ragel provides a very flexibile interface to the host language that at‐
32 tempts to place minimal restrictions on how the generated code is used
33 and integrated into the application. The generated code has no depen‐
34 dencies.
35
36
38 -h, -H, -?, --help
39 Display help and exit.
40
41 -v Print version information and exit.
42
43 -o file
44 Write output to file. If -o is not given, a default file name is
45 chosen by replacing the file extenstion of the input file. For
46 source files ending in .rh the suffix .h is used. For all other
47 source files a suffix based on the output language is used (.c,
48 .cpp, .m, etc.). If -o is not given for Graphviz output the gen‐
49 erated dot file is written to standard output.
50
51 -s Print some statistics on standard error.
52
53 --error-format=gnu
54 Print error messages using the format "file:line:column:" (de‐
55 fault)
56
57 --error-format=msvc
58 Print error messages using the format "file(line,column):"
59
60 -d Do not remove duplicate actions from action lists.
61
62 -I dir
63 Add dir to the list of directories to search for included and
64 imported files
65
66 -n Do not perform state minimization.
67
68 -m Perform minimization once, at the end of the state machine com‐
69 pilation.
70
71 -l Minimize after nearly every operation. Lists of like operations
72 such as unions are minimized once at the end. This is the de‐
73 fault minimization option.
74
75 -e Minimize after every operation.
76
77 -x Compile the state machines and emit an XML representation of the
78 host data and the machines.
79
80 -V Generate a dot file for Graphviz.
81
82 -p Display printable characters on labels.
83
84 -L Inhibit writing of #line directives.
85
86 -S <spec>
87 FSM specification to output.
88
89 -M <machine>
90 Machine definition/instantiation to output.
91
92 -C The host language is C, C++, Obj-C or Obj-C++. This is the de‐
93 fault host language option.
94
95 --asm --gas-x86-64-sys-v
96 GNU AS, x86_64, System V ABI ASM is generated in a code style
97 equiv to -G2
98
99 -D The host language is D.
100
101 -Z The host language is Go.
102
103 -J The host language is Java.
104
105 -R The host language is Ruby.
106
107 -A The host language is C#
108
109 -O The host language is OCaml.
110
111 -U The host language is Rust.
112
113 -Y The host language is Julia.
114
115 -K The host language is Crack.
116
117 -P The host language is JavaScript.
118
119 -T0 (C/D/Java/Ruby/C#) Generate a table driven FSM. This is the de‐
120 fault code style. The table driven FSM represents the state ma‐
121 chine as static data. There are tables of states, transitions,
122 indices and actions. The current state is stored in a variable.
123 The execution is a loop that looks that given the current state
124 and current character to process looks up the transition to take
125 using a binary search, executes any actions and moves to the
126 target state. In general, the table driven FSM produces a
127 smaller binary and requires a less expensive host language com‐
128 pile but results in slower running code. The table driven FSM is
129 suitable for any FSM.
130
131 -T1 (C/D/Ruby/C#) Generate a faster table driven FSM by expanding
132 action lists in the action execute code.
133
134 -F0 (C/D/Ruby/C#) Generate a flat table driven FSM. Transitions are
135 represented as an array indexed by the current alphabet charac‐
136 ter. This eliminates the need for a binary search to locate
137 transitions and produces faster code, however it is only suit‐
138 able for small alphabets.
139
140 -F1 (C/D/Ruby/C#) Generate a faster flat table driven FSM by expand‐
141 ing action lists in the action execute code.
142
143 -G0 (C/D/C#) Generate a goto driven FSM. The goto driven FSM repre‐
144 sents the state machine as a series of goto statements. While in
145 the machine, the current state is stored by the processor's in‐
146 struction pointer. The execution is a flat function where con‐
147 trol is passed from state to state using gotos. In general, the
148 goto FSM produces faster code but results in a larger binary and
149 a more expensive host language compile.
150
151 -G1 (C/D/C#) Generate a faster goto driven FSM by expanding action
152 lists in the action execute code.
153
154 -G2 (C/D/Go) Generate a really fast goto driven FSM by embedding ac‐
155 tion lists in the state machine control code.
156
157 --nfa-conds-depth=D
158 Search for high-cost conditions inside a prefix of the machine
159 (depth D from start state). Search is rooted at NFA union con‐
160 tructs.
161
162 --nfa-term-check
163 Search for condition-based general repetitions that will not
164 function properly and must be NFA reps. Search is rooted at NFA
165 union constructs.
166
167 --nfa-intermed-state-limit=L
168 Report fail if number of states exceeds this during compilation.
169
170 --nfa-final-state-limit=L
171 Report a fail if number states in final machine exceeds this.
172
173 --nfa-breadth-check=E1,E2,..
174 Report breadth cost of named entry points by (and start). Re‐
175 porting starts at NFA union contructs.
176
177 --input-histogram=FN
178 Input char histogram for breadth check. If unspecified a flat
179 histogram is used.
180
182 NOTE: This is a very brief description of Ragel input. Ragel is de‐
183 scribed in more detail in the user guide available from the homepage
184 (see below).
185
186 Ragel normally passes input files straight to the output. When it sees
187 an FSM specification that contains machine instantiations it stops to
188 generate the state machine. If there are write statements (such as
189 "write exec") then ragel emits the corresponding code. There can be any
190 number of FSM specifications in an input file. A multi-line FSM speci‐
191 fication starts with '%%{' and ends with '}%%'. A single line FSM spec‐
192 ification starts with %% and ends at the first newline.
193
195 Machine Name:
196 Set the the name of the machine. If given, it must be the first
197 statement.
198
199 Alphabet Type:
200 Set the data type of the alphabet.
201
202 GetKey:
203 Specify how to retrieve the alphabet character from the element
204 type.
205
206 Include:
207 Include a machine of same name as the current or of a different
208 name in either the current file or some other file.
209
210 Action Definition:
211 Define an action that can be invoked by the FSM.
212
213 Fsm Definition, Instantiation and Longest Match Instantiation:
214 Used to build FSMs. Syntax description in next few sections.
215
216 Access:
217 Specify how to access the persistent state machine variables.
218
219 Write: Write some component of the machine.
220
221 Variable:
222 Override the default variable names (p, pe, cs, act, etc).
223
225 The basic machines are the base operands of the regular language ex‐
226 pressions.
227
228 'hello'
229 Concat literal. Produces a concatenation of the characters in
230 the string. Supports escape sequences with '\'. The result
231 will have a start state and a transition to a new state for each
232 character in the string. The last state in the sequence will be
233 made final. To make the string case-insensitive, append an 'i'
234 to the string, as in 'cmd'i.
235
236 "hello"
237 Identical to single quote version.
238
239 [hello]
240 Or literal. Produces a union of characters. Supports character
241 ranges with '-', negating the sense of the union with an initial
242 '^' and escape sequences with '\'. The result will have two
243 states with a transition between them for each character or
244 range.
245
246 NOTE: '', "", and [] produce null FSMs. Null machines have one state
247 that is both a start state and a final state and match the zero length
248 string. A null machine may be created with the null builtin machine.
249
250 integer
251 Makes a two state machine with one transition on the given inte‐
252 ger number.
253
254 hex Makes a two state machine with one transition on the given hex‐
255 idecimal number.
256
257 /simple_regex/
258 A simple regular expression. Supports the notation '.', '*' and
259 '[]', character ranges with '-', negating the sense of an OR ex‐
260 pression with and initial '^' and escape sequences with '\'.
261 Also supports one trailing flag: i. Use it to produce a case-in‐
262 sensitive regular expression, as in /GET/i.
263
264 lit .. lit
265 Specifies a range. The allowable upper and lower bounds are con‐
266 cat literals of length one and number machines. For example,
267 0x10..0x20, 0..63, and 'a'..'z' are valid ranges.
268
269 variable_name
270 References the machine definition assigned to the variable name
271 given.
272
273 builtin_machine
274 There are several builtin machines available. They are all two
275 state machines for the purpose of matching common classes of
276 characters. They are:
277
278 any Any character in the alphabet.
279
280 ascii Ascii characters 0..127.
281
282 extend Ascii extended characters. This is the range -128..127
283 for signed alphabets and the range 0..255 for unsigned
284 alphabets.
285
286 alpha Alphabetic characters /[A-Za-z]/.
287
288 digit Digits /[0-9]/.
289
290 alnum Alpha numerics /[0-9A-Za-z]/.
291
292 lower Lowercase characters /[a-z]/.
293
294 upper Uppercase characters /[A-Z]/.
295
296 xdigit Hexidecimal digits /[0-9A-Fa-f]/.
297
298 cntrl Control characters 0..31, 127.
299
300 graph Graphical characters /[!-~]/.
301
302 print Printable characters /[ -~]/.
303
304 punct Punctuation. Graphical characters that are not alpha-nu‐
305 merics /[!-/:-@\[-`{-~]/.
306
307 space Whitespace /[\t\v\f\n\r ]/.
308
309 null Zero length string. Equivalent to '', "" and [].
310
311 empty Empty set. Matches nothing.
312
314 Operators are grouped by precedence, group 1 being the lowest and group
315 6 the highest.
316
317 GROUP 1:
318
319 expr , expr
320 Join machines together without drawing any transitions, setting
321 up a start state or any final states. Start state must be ex‐
322 plicitly specified with the "start" label. Final states may be
323 specified with the an epsilon transitions to the implicitly cre‐
324 ated "final" state.
325
326 GROUP 2:
327
328 expr | expr
329 Produces a machine that matches any string in machine one or ma‐
330 chine two.
331
332 expr & expr
333 Produces a machine that matches any string that is in both ma‐
334 chine one and machine two.
335
336 expr - expr
337 Produces a machine that matches any string that is in machine
338 one but not in machine two.
339
340 expr -- expr
341 Strong Subtraction. Matches any string in machine one that does
342 not have any string in machine two as a substring.
343
344 GROUP 3:
345
346 expr . expr
347 Produces a machine that matches all the strings in machine one
348 followed by all the strings in machine two.
349
350 expr :> expr
351 Entry-Guarded Concatenation: terminates machine one upon entry
352 to machine two.
353
354 expr :>> expr
355 Finish-Guarded Concatenation: terminates machine one when ma‐
356 chine two finishes.
357
358 expr <: expr
359 Left-Guarded Concatenation: gives a higher priority to machine
360 one.
361
362 NOTE: Concatenation is the default operator. Two machines next to each
363 other with no operator between them results in the concatenation opera‐
364 tion.
365
366 GROUP 4:
367
368 label: expr
369 Attaches a label to an expression. Labels can be used by epsilon
370 transitions and fgoto and fcall statements in actions. Also note
371 that the referencing of a machine definition causes the implicit
372 creation of label by the same name.
373
374 GROUP 5:
375
376 expr -> label
377 Draws an epsilon transition to the state defined by label. Label
378 must be a name in the current scope. Epsilon transitions are re‐
379 solved when comma operators are evaluated and at the root of the
380 expression tree of machine assignment/instantiation.
381
382 GROUP 6: Actions
383
384 An action may be a name predefined with an action statement or may be
385 specified directly with '{' and '}' in the expression.
386
387 expr > action
388 Embeds action into starting transitions.
389
390 expr @ action
391 Embeds action into transitions that go into a final state.
392
393 expr $ action
394 Embeds action into all transitions. Does not include pending out
395 transitions.
396
397 expr % action
398 Embeds action into pending out transitions from final states.
399
400 GROUP 6: EOF Actions
401
402 When a machine's finish routine is called the current state's EOF ac‐
403 tions are executed.
404
405 expr >/ action
406 Embed an EOF action into the start state.
407
408 expr </ action
409 Embed an EOF action into all states except the start state.
410
411 expr $/ action
412 Embed an EOF action into all states.
413
414 expr %/ action
415 Embed an EOF action into final states.
416
417 expr @/ action
418 Embed an EOF action into all states that are not final.
419
420 expr <>/ action
421 Embed an EOF action into all states that are not the start state
422 and that are not final (middle states).
423
424 GROUP 6: Global Error Actions
425
426 Global error actions are stored in states until the final state machine
427 has been fully constructed. They are then transferred to error transi‐
428 tions, giving the effect of a default action.
429
430 expr >! action
431 Embed a global error action into the start state.
432
433 expr <! action
434 Embed a global error action into all states except the start
435 state.
436
437 expr $! action
438 Embed a global error action into all states.
439
440 expr %! action
441 Embed a global error action into the final states.
442
443 expr @! action
444 Embed a global error action into all states which are not final.
445
446 expr <>! action
447 Embed a global error action into all states which are not the
448 start state and are not final (middle states).
449
450 GROUP 6: Local Error Actions
451
452 Local error actions are stored in states until the named machine is
453 fully constructed. They are then transferred to error transitions, giv‐
454 ing the effect of a default action for a section of the total machine.
455 Note that the name may be omitted, in which case the action will be
456 transferred to error actions upon construction of the current machine.
457
458 expr >^ action
459 Embed a local error action into the start state.
460
461 expr <^ action
462 Embed a local error action into all states except the start
463 state.
464
465 expr $^ action
466 Embed a local error action into all states.
467
468 expr %^ action
469 Embed a local error action into the final states.
470
471 expr @^ action
472 Embed a local error action into all states which are not final.
473
474 expr <>^ action
475 Embed a local error action into all states which are not the
476 start state and are not final (middle states).
477
478 GROUP 6: To-State Actions
479
480 To state actions are stored in states and executed any time the machine
481 moves into a state. This includes regular transitions, and transfers of
482 control such as fgoto. Note that setting the current state from outside
483 the machine (for example during initialization) does not count as a
484 transition into a state.
485
486 expr >~ action
487 Embed a to-state action action into the start state.
488
489 expr <~ action
490 Embed a to-state action into all states except the start state.
491
492 expr $~ action
493 Embed a to-state action into all states.
494
495 expr %~ action
496 Embed a to-state action into the final states.
497
498 expr @~ action
499 Embed a to-state action into all states which are not final.
500
501 expr <>~ action
502 Embed a to-state action into all states which are not the start
503 state and are not final (middle states).
504
505 GROUP 6: From-State Actions
506
507 From state actions are executed whenever a state takes a transition on
508 a character. This includes the error transition and a transition to
509 self.
510
511 expr >* action
512 Embed a from-state action into the start state.
513
514 expr <* action
515 Embed a from-state action into every state except the start
516 state.
517
518 expr $* action
519 Embed a from-state action into all states.
520
521 expr %* action
522 Embed a from-state action into the final states.
523
524 expr @* action
525 Embed a from-state action into all states which are not final.
526
527 expr <>* action
528 Embed a from-state action into all states which are not the
529 start state and are not final (middle states).
530
531 GROUP 6: Priority Assignment
532
533 Priorities are assigned to names within transitions. Only priorities on
534 the same name are allowed to interact. In the first form of priorities
535 the name defaults to the name of the machine definition the priority is
536 assigned in. Transitions do not have default priorities.
537
538 expr > int
539 Assigns the priority int in all transitions leaving the start
540 state.
541
542 expr @ int
543 Assigns the priority int in all transitions that go into a final
544 state.
545
546 expr $ int
547 Assigns the priority int in all existing transitions.
548
549 expr % int
550 Assigns the priority int in all pending out transitions.
551
552 A second form of priority assignment allows the programmer to specify
553 the name to which the priority is assigned, allowing interactions to
554 cross machine definition boundaries.
555
556 expr > (name,int)
557 Assigns the priority int to name in all transitions leaving the
558 start state.
559
560 expr @ (name, int)
561 Assigns the priority int to name in all transitions that go into
562 a final state.
563
564 expr $ (name, int)
565 Assigns the priority int to name in all existing transitions.
566
567 expr % (name, int)
568 Assigns the priority int to name in all pending out transitions.
569
570 GROUP 7:
571
572 expr * Produces the kleene star of a machine. Matches zero or more rep‐
573 etitions of the machine.
574
575 expr **
576 Longest-Match Kleene Star. This version of kleene star puts a
577 higher priority on staying in the machine over wrapping around
578 and starting over. This operator is equivalent to ( ( expr ) $0
579 %1 )*.
580
581 expr ? Produces a machine that accepts the machine given or the null
582 string. This operator is equivalent to ( expr | '' ).
583
584 expr + Produces the machine concatenated with the kleen star of itself.
585 Matches one or more repetitions of the machine. This operator
586 is equivalent to ( expr . expr* ).
587
588 expr {n}
589 Produces a machine that matches exactly n repetitions of expr.
590
591 expr {,n}
592 Produces a machine that matches anywhere from zero to n repeti‐
593 tions of expr.
594
595 expr {n,}
596 Produces a machine that matches n or more repetitions of expr.
597
598 expr {n,m}
599 Produces a machine that matches n to m repetitions of expr.
600
601 GROUP 8:
602
603 ! expr Produces a machine that matches any string not matched by the
604 given machine. This operator is equivalent to ( *extend - expr
605 ).
606
607 ^ expr Character-Level Negation. Matches any single character not
608 matched by the single character machine expr.
609
610 GROUP 9:
611
612 ( expr )
613 Forces precedence on operators.
614
616 fc The current character. Equivalent to *p.
617
618 fpc A pointer to the current character. Equivalent to p.
619
620 fcurs An integer value representing the current state.
621
622 ftargs An integer value representing the target state.
623
624 fentry(<label>)
625 An integer value representing the entry point <label>.
626
628 fhold; Do not advance over the current character. Equivalent to --p;.
629
630 fexec <expr>;
631 Sets the current character to something else. Equivalent to p =
632 (<expr>)-1;
633
634 fgoto <label>;
635 Jump to the machine defined by <label>.
636
637 fgoto *<expr>;
638 Jump to the entry point given by <expr>. The expression must
639 evaluate to an integer value representing a state.
640
641 fnext <label>;
642 Set the next state to be the entry point defined by <label>.
643 The fnext statement does not immediately jump to the specified
644 state. Any action code following the statement is executed.
645
646 fnext *<expr>;
647 Set the next state to be the entry point given by <expr>. The
648 expression must evaluate to an integer value representing a
649 state.
650
651 fcall <label>;
652 Call the machine defined by <label>. The next fret will jump to
653 the target of the transition on which the action is invoked.
654
655 fcall *<expr>;
656 Call the entry point given by <expr>. The next fret will jump to
657 the target of the transition on which the action is invoked.
658
659 fret; Return to the target state of the transition on which the last
660 fcall was made.
661
662 fbreak;
663 Save the current state and immediately break out of the machine.
664
666 Ragel was written by Adrian Thurston <thurston@colm.net>. There have
667 been many contributors. Please see CREDITS file in source distribution.
668
670 re2c(1), flex(1)
671
672 Homepage: http://www.colm.net/open-source/ragel/
673
674
675
676Ragel @VERSION@ @PUBDATE@ RAGEL(1)