1
2
3RAGEL(1) Ragel State Machine Compiler RAGEL(1)
4
5
6
8 ragel - compile regular languages into executable state machines
9
11 ragel [options] file
12
14 Ragel compiles executable finite state machines from regular languages.
15 Ragel can generate C, C++, Objective-C, D, or Java code. Ragel state
16 machines can not only recognize byte sequences as regular expression
17 machines do, but can also execute code at arbitrary points in the
18 recognition of a regular language. User code is embedded using inline
19 operators that do not disrupt the regular language syntax.
20
21 The core language consists of standard regular expression operators,
22 such as union, concatenation and kleene star, accompanied by action
23 embedding operators. Ragel also provides operators that let you control
24 any non-determinism that you create, construct scanners using the long‐
25 est match paradigm, and build state machines using the statechart
26 model. It is also possible to influence the execution of a state
27 machine from inside an embedded action by jumping or calling to other
28 parts of the machine and reprocessing input.
29
30 Ragel provides a very flexibile interface to the host language that
31 attempts to place minimal restrictions on how the generated code is
32 used and integrated into the application. The generated code has no
33 dependencies.
34
35
37 -h, -H, -?, --help
38 Display help and exit.
39
40 -v Print version information and exit.
41
42 -o file
43 Write output to file. If -o is not given, a default file name is
44 chosen by replacing the file extenstion of the input file. For
45 source files ending in .rh the suffix .h is used. For all other
46 source files a suffix based on the output language is used (.c,
47 .cpp, .m, etc.). If -o is not given for Graphviz output the gen‐
48 erated dot file is written to standard output.
49
50 -s Print some statistics on standard error.
51
52 --error-format=gnu
53 Print error messages using the format "file:line:column:"
54 (default)
55
56 --error-format=msvc
57 Print error messages using the format "file(line,column):"
58
59 -d Do not remove duplicate actions from action lists.
60
61 -I dir
62 Add dir to the list of directories to search for included and
63 imported files
64
65 -n Do not perform state minimization.
66
67 -m Perform minimization once, at the end of the state machine com‐
68 pilation.
69
70 -l Minimize after nearly every operation. Lists of like operations
71 such as unions are minimized once at the end. This is the
72 default minimization option.
73
74 -e Minimize after every operation.
75
76 -x Compile the state machines and emit an XML representation of the
77 host data and the machines.
78
79 -V Generate a dot file for Graphviz.
80
81 -p Display printable characters on labels.
82
83 -S <spec>
84 FSM specification to output.
85
86 -M <machine>
87 Machine definition/instantiation to output.
88
89 -C The host language is C, C++, Obj-C or Obj-C++. This is the
90 default host language option.
91
92 -D The host language is D.
93
94 -J The host language is Java.
95
96 -R The host language is Ruby.
97
98 -L Inhibit writing of #line directives.
99
100 -T0 (C/D/Java/Ruby/C#) Generate a table driven FSM. This is the
101 default code style. The table driven FSM represents the state
102 machine as static data. There are tables of states, transitions,
103 indicies and actions. The current state is stored in a variable.
104 The execution is a loop that looks that given the current state
105 and current character to process looks up the transition to take
106 using a binary search, executes any actions and moves to the
107 target state. In general, the table driven FSM produces a
108 smaller binary and requires a less expensive host language com‐
109 pile but results in slower running code. The table driven FSM is
110 suitable for any FSM.
111
112 -T1 (C/D/Ruby/C#) Generate a faster table driven FSM by expanding
113 action lists in the action execute code.
114
115 -F0 (C/D/Ruby/C#) Generate a flat table driven FSM. Transitions are
116 represented as an array indexed by the current alphabet charac‐
117 ter. This eliminates the need for a binary search to locate
118 transitions and produces faster code, however it is only suit‐
119 able for small alphabets.
120
121 -F1 (C/D/Ruby/C#) Generate a faster flat table driven FSM by expand‐
122 ing action lists in the action execute code.
123
124 -G0 (C/D/C#) Generate a goto driven FSM. The goto driven FSM repre‐
125 sents the state machine as a series of goto statements. While in
126 the machine, the current state is stored by the processor's
127 instruction pointer. The execution is a flat function where con‐
128 trol is passed from state to state using gotos. In general, the
129 goto FSM produces faster code but results in a larger binary and
130 a more expensive host language compile.
131
132 -G1 (C/D/C#) Generate a faster goto driven FSM by expanding action
133 lists in the action execute code.
134
135 -G2 (C/D) Generate a really fast goto driven FSM by embedding action
136 lists in the state machine control code.
137
138 -P<N> (C/D) N-Way Split really fast goto-driven FSM.
139
140
142 NOTE: This is a very brief description of Ragel input. Ragel is
143 described in more detail in the user guide available from the homepage
144 (see below).
145
146 Ragel normally passes input files straight to the output. When it sees
147 an FSM specification that contains machine instantiations it stops to
148 generate the state machine. If there are write statements (such as
149 "write exec") then ragel emits the corresponding code. There can be any
150 number of FSM specifications in an input file. A multi-line FSM speci‐
151 fication starts with '%%{' and ends with '}%%'. A single line FSM spec‐
152 ification starts with %% and ends at the first newline.
153
155 Machine Name:
156 Set the the name of the machine. If given, it must be the first
157 statement.
158
159 Alphabet Type:
160 Set the data type of the alphabet.
161
162 GetKey:
163 Specify how to retrieve the alphabet character from the element
164 type.
165
166 Include:
167 Include a machine of same name as the current or of a different
168 name in either the current file or some other file.
169
170 Action Definition:
171 Define an action that can be invoked by the FSM.
172
173 Fsm Definition, Instantiation and Longest Match Instantiation:
174 Used to build FSMs. Syntax description in next few sections.
175
176 Access:
177 Specify how to access the persistent state machine variables.
178
179 Write: Write some component of the machine.
180
181 Variable:
182 Override the default variable names (p, pe, cs, act, etc).
183
185 The basic machines are the base operands of the regular language
186 expressions.
187
188 'hello'
189 Concat literal. Produces a concatenation of the characters in
190 the string. Supports escape sequences with '\'. The result
191 will have a start state and a transition to a new state for each
192 character in the string. The last state in the sequence will be
193 made final. To make the string case-insensitive, append an 'i'
194 to the string, as in 'cmd'i.
195
196 "hello"
197 Identical to single quote version.
198
199 [hello]
200 Or literal. Produces a union of characters. Supports character
201 ranges with '-', negating the sense of the union with an initial
202 '^' and escape sequences with '\'. The result will have two
203 states with a transition between them for each character or
204 range.
205
206 NOTE: '', "", and [] produce null FSMs. Null machines have one state
207 that is both a start state and a final state and match the zero length
208 string. A null machine may be created with the null builtin machine.
209
210 integer
211 Makes a two state machine with one transition on the given inte‐
212 ger number.
213
214 hex Makes a two state machine with one transition on the given hex‐
215 idecimal number.
216
217 /simple_regex/
218 A simple regular expression. Supports the notation '.', '*' and
219 '[]', character ranges with '-', negating the sense of an OR
220 expression with and initial '^' and escape sequences with '\'.
221 Also supports one trailing flag: i. Use it to produce a case-
222 insensitive regular expression, as in /GET/i.
223
224 lit .. lit
225 Specifies a range. The allowable upper and lower bounds are con‐
226 cat literals of length one and number machines. For example,
227 0x10..0x20, 0..63, and 'a'..'z' are valid ranges.
228
229 variable_name
230 References the machine definition assigned to the variable name
231 given.
232
233 builtin_machine
234 There are several builtin machines available. They are all two
235 state machines for the purpose of matching common classes of
236 characters. They are:
237
238 any Any character in the alphabet.
239
240 ascii Ascii characters 0..127.
241
242 extend Ascii extended characters. This is the range -128..127
243 for signed alphabets and the range 0..255 for unsigned
244 alphabets.
245
246 alpha Alphabetic characters /[A-Za-z]/.
247
248 digit Digits /[0-9]/.
249
250 alnum Alpha numerics /[0-9A-Za-z]/.
251
252 lower Lowercase characters /[a-z]/.
253
254 upper Uppercase characters /[A-Z]/.
255
256 xdigit Hexidecimal digits /[0-9A-Fa-f]/.
257
258 cntrl Control characters 0..31.
259
260 graph Graphical characters /[!-~]/.
261
262 print Printable characters /[ -~]/.
263
264 punct Punctuation. Graphical characters that are not alpha-
265 numerics /[!-/:-@\[-`{-~]/.
266
267 space Whitespace /[\t\v\f\n\r ]/.
268
269 null Zero length string. Equivalent to '', "" and [].
270
271 empty Empty set. Matches nothing.
272
274 Operators are grouped by precedence, group 1 being the lowest and group
275 6 the highest.
276
277 GROUP 1:
278
279 expr , expr
280 Join machines together without drawing any transitions, setting
281 up a start state or any final states. Start state must be
282 explicitly specified with the "start" label. Final states may be
283 specified with the an epsilon transitions to the implicitly cre‐
284 ated "final" state.
285
286 GROUP 2:
287
288 expr | expr
289 Produces a machine that matches any string in machine one or
290 machine two.
291
292 expr & expr
293 Produces a machine that matches any string that is in both
294 machine one and machine two.
295
296 expr - expr
297 Produces a machine that matches any string that is in machine
298 one but not in machine two.
299
300 expr -- expr
301 Strong Subtraction. Matches any string in machine one that does
302 not have any string in machine two as a substring.
303
304 GROUP 3:
305
306 expr . expr
307 Produces a machine that matches all the strings in machine one
308 followed by all the strings in machine two.
309
310 expr :> expr
311 Entry-Guarded Concatenation: terminates machine one upon entry
312 to machine two.
313
314 expr :>> expr
315 Finish-Guarded Concatenation: terminates machine one when
316 machine two finishes.
317
318 expr <: expr
319 Left-Guarded Concatenation: gives a higher priority to machine
320 one.
321
322 NOTE: Concatenation is the default operator. Two machines next to each
323 other with no operator between them results in the concatenation opera‐
324 tion.
325
326 GROUP 4:
327
328 label: expr
329 Attaches a label to an expression. Labels can be used by epsilon
330 transitions and fgoto and fcall statements in actions. Also note
331 that the referencing of a machine definition causes the implicit
332 creation of label by the same name.
333
334 GROUP 5:
335
336 expr -> label
337 Draws an epsilon transition to the state defined by label. Label
338 must be a name in the current scope. Epsilon transitions are
339 resolved when comma operators are evaluated and at the root of
340 the expression tree of machine assignment/instantiation.
341
342 GROUP 6: Actions
343
344 An action may be a name predefined with an action statement or may be
345 specified directly with '{' and '}' in the expression.
346
347 expr > action
348 Embeds action into starting transitions.
349
350 expr @ action
351 Embeds action into transitions that go into a final state.
352
353 expr $ action
354 Embeds action into all transitions. Does not include pending out
355 transitions.
356
357 expr % action
358 Embeds action into pending out transitions from final states.
359
360 GROUP 6: EOF Actions
361
362 When a machine's finish routine is called the current state's EOF
363 actions are executed.
364
365 expr >/ action
366 Embed an EOF action into the start state.
367
368 expr </ action
369 Embed an EOF action into all states except the start state.
370
371 expr $/ action
372 Embed an EOF action into all states.
373
374 expr %/ action
375 Embed an EOF action into final states.
376
377 expr @/ action
378 Embed an EOF action into all states that are not final.
379
380 expr <>/ action
381 Embed an EOF action into all states that are not the start state
382 and that are not final (middle states).
383
384 GROUP 6: Global Error Actions
385
386 Global error actions are stored in states until the final state machine
387 has been fully constructed. They are then transferred to error transi‐
388 tions, giving the effect of a default action.
389
390 expr >! action
391 Embed a global error action into the start state.
392
393 expr <! action
394 Embed a global error action into all states except the start
395 state.
396
397 expr $! action
398 Embed a global error action into all states.
399
400 expr %! action
401 Embed a global error action into the final states.
402
403 expr @! action
404 Embed a global error action into all states which are not final.
405
406 expr <>! action
407 Embed a global error action into all states which are not the
408 start state and are not final (middle states).
409
410 GROUP 6: Local Error Actions
411
412 Local error actions are stored in states until the named machine is
413 fully constructed. They are then transferred to error transitions, giv‐
414 ing the effect of a default action for a section of the total machine.
415 Note that the name may be omitted, in which case the action will be
416 transferred to error actions upon construction of the current machine.
417
418 expr >^ action
419 Embed a local error action into the start state.
420
421 expr <^ action
422 Embed a local error action into all states except the start
423 state.
424
425 expr $^ action
426 Embed a local error action into all states.
427
428 expr %^ action
429 Embed a local error action into the final states.
430
431 expr @^ action
432 Embed a local error action into all states which are not final.
433
434 expr <>^ action
435 Embed a local error action into all states which are not the
436 start state and are not final (middle states).
437
438 GROUP 6: To-State Actions
439
440 To state actions are stored in states and executed any time the machine
441 moves into a state. This includes regular transitions, and transfers of
442 control such as fgoto. Note that setting the current state from outside
443 the machine (for example during initialization) does not count as a
444 transition into a state.
445
446 expr >~ action
447 Embed a to-state action action into the start state.
448
449 expr <~ action
450 Embed a to-state action into all states except the start state.
451
452 expr $~ action
453 Embed a to-state action into all states.
454
455 expr %~ action
456 Embed a to-state action into the final states.
457
458 expr @~ action
459 Embed a to-state action into all states which are not final.
460
461 expr <>~ action
462 Embed a to-state action into all states which are not the start
463 state and are not final (middle states).
464
465 GROUP 6: From-State Actions
466
467 From state actions are executed whenever a state takes a transition on
468 a character. This includes the error transition and a transition to
469 self.
470
471 expr >* action
472 Embed a from-state action into the start state.
473
474 expr <* action
475 Embed a from-state action into every state except the start
476 state.
477
478 expr $* action
479 Embed a from-state action into all states.
480
481 expr %* action
482 Embed a from-state action into the final states.
483
484 expr @* action
485 Embed a from-state action into all states which are not final.
486
487 expr <>* action
488 Embed a from-state action into all states which are not the
489 start state and are not final (middle states).
490
491 GROUP 6: Priority Assignment
492
493 Priorities are assigned to names within transitions. Only priorities on
494 the same name are allowed to interact. In the first form of priorities
495 the name defaults to the name of the machine definition the priority is
496 assigned in. Transitions do not have default priorities.
497
498 expr > int
499 Assigns the priority int in all transitions leaving the start
500 state.
501
502 expr @ int
503 Assigns the priority int in all transitions that go into a final
504 state.
505
506 expr $ int
507 Assigns the priority int in all existing transitions.
508
509 expr % int
510 Assigns the priority int in all pending out transitions.
511
512 A second form of priority assignment allows the programmer to specify
513 the name to which the priority is assigned, allowing interactions to
514 cross machine definition boundaries.
515
516 expr > (name,int)
517 Assigns the priority int to name in all transitions leaving the
518 start state.
519
520 expr @ (name, int)
521 Assigns the priority int to name in all transitions that go into
522 a final state.
523
524 expr $ (name, int)
525 Assigns the priority int to name in all existing transitions.
526
527 expr % (name, int)
528 Assigns the priority int to name in all pending out transitions.
529
530 GROUP 7:
531
532 expr * Produces the kleene star of a machine. Matches zero or more rep‐
533 etitions of the machine.
534
535 expr **
536 Longest-Match Kleene Star. This version of kleene star puts a
537 higher priority on staying in the machine over wrapping around
538 and starting over. This operator is equivalent to ( ( expr ) $0
539 %1 )*.
540
541 expr ? Produces a machine that accepts the machine given or the null
542 string. This operator is equivalent to ( expr | '' ).
543
544 expr + Produces the machine concatenated with the kleen star of itself.
545 Matches one or more repetitions of the machine. This operator
546 is equivalent to ( expr . expr* ).
547
548 expr {n}
549 Produces a machine that matches exactly n repetitions of expr.
550
551 expr {,n}
552 Produces a machine that matches anywhere from zero to n repeti‐
553 tions of expr.
554
555 expr {n,}
556 Produces a machine that matches n or more repetitions of expr.
557
558 expr {n,m}
559 Produces a machine that matches n to m repetitions of expr.
560
561 GROUP 8:
562
563 ! expr Produces a machine that matches any string not matched by the
564 given machine. This operator is equivalent to ( *extend - expr
565 ).
566
567 ^ expr Character-Level Negation. Matches any single character not
568 matched by the single character machine expr.
569
570 GROUP 9:
571
572 ( expr )
573 Forces precedence on operators.
574
576 fc The current character. Equivalent to *p.
577
578 fpc A pointer to the current character. Equivalent to p.
579
580 fcurs An integer value representing the current state.
581
582 ftargs An integer value representing the target state.
583
584 fentry(<label>)
585 An integer value representing the entry point <label>.
586
588 fhold; Do not advance over the current character. Equivalent to --p;.
589
590 fexec <expr>;
591 Sets the current character to something else. Equivalent to p =
592 (<expr>)-1;
593
594 fgoto <label>;
595 Jump to the machine defined by <label>.
596
597 fgoto *<expr>;
598 Jump to the entry point given by <expr>. The expression must
599 evaluate to an integer value representing a state.
600
601 fnext <label>;
602 Set the next state to be the entry point defined by <label>.
603 The fnext statement does not immediately jump to the specified
604 state. Any action code following the statement is executed.
605
606 fnext *<expr>;
607 Set the next state to be the entry point given by <expr>. The
608 expression must evaluate to an integer value representing a
609 state.
610
611 fcall <label>;
612 Call the machine defined by <label>. The next fret will jump to
613 the target of the transition on which the action is invoked.
614
615 fcall *<expr>;
616 Call the entry point given by <expr>. The next fret will jump to
617 the target of the transition on which the action is invoked.
618
619 fret; Return to the target state of the transition on which the last
620 fcall was made.
621
622 fbreak;
623 Save the current state and immediately break out of the machine.
624
626 Ragel was written by Adrian Thurston <thurston@complang.org>. Objec‐
627 tive-C output contributed by Erich Ocean. D output contributed by Alan
628 West. Ruby output contributed by Victor Hugo Borja. C Sharp code gener‐
629 ation contributed by Daniel Tang. Contributions to Java code generation
630 by Colin Fleming.
631
633 re2c(1), flex(1)
634
635 Homepage: http://www.complang.org/ragel/
636
637
638
639Ragel 6.6 Dec 2009 RAGEL(1)