1
2
3RAGEL(1) Ragel State Machine Compiler RAGEL(1)
4
5
6
8 ragel - compile regular languages into executable state machines
9
11 ragel [options] file
12
14 Ragel compiles executable finite state machines from regular languages.
15 Ragel can generate C, C++, Objective-C, D, or Java code. Ragel state
16 machines can not only recognize byte sequences as regular expression
17 machines do, but can also execute code at arbitrary points in the
18 recognition of a regular language. User code is embedded using inline
19 operators that do not disrupt the regular language syntax.
20
21 The core language consists of standard regular expression operators,
22 such as union, concatenation and kleene star, accompanied by action
23 embedding operators. Ragel also provides operators that let you control
24 any non-determinism that you create, construct scanners using the long‐
25 est match paradigm, and build state machines using the statechart
26 model. It is also possible to influence the execution of a state
27 machine from inside an embedded action by jumping or calling to other
28 parts of the machine and reprocessing input.
29
30 Ragel provides a very flexibile interface to the host language that
31 attempts to place minimal restrictions on how the generated code is
32 used and integrated into the application. The generated code has no
33 dependencies.
34
35
37 -h, -H, -?, --help
38 Display help and exit.
39
40 -v Print version information and exit.
41
42 -o file
43 Write output to file. If -o is not given, a default file name is
44 chosen by replacing the suffix of the input. For source files
45 ending in .rh the suffix .h is used. For all other source files
46 a suffix based on the output language is used (.c, .cpp, .m,
47 .dot)
48
49 -s Print some statistics on standard error.
50
51 --error-format=gnu
52 Print error messages using the format "file:line:column:"
53 (default)
54
55 --error-format=msvc
56 Print error messages using the format "file(line,column):"
57
58 -d Do not remove duplicate actions from action lists.
59
60 -I dir
61 Add dir to the list of directories to search for included and
62 imported files
63
64 -n Do not perform state minimization.
65
66 -m Perform minimization once, at the end of the state machine com‐
67 pilation.
68
69 -l Minimize after nearly every operation. Lists of like operations
70 such as unions are minimized once at the end. This is the
71 default minimization option.
72
73 -e Minimize after every operation.
74
75 -x Run the frontend only: emit XML intermediate format.
76
77 -V Generate a dot file for Graphviz.
78
79 -p Display printable characters on labels.
80
81 -S <spec>
82 FSM specification to output
83
84 -M <machine>
85 Machine definition/instantiation to output
86
87 -C The host language is C, C++, Obj-C or Obj-C++. This is the
88 default host language option.
89
90 -D The host language is D.
91
92 -J The host language is Java.
93
94 -R The host language is Ruby.
95
96 -L Inhibit writing of #line directives.
97
98 -T0 Table driven FSM (default).
99
100 -T1 Faster table driven FSM.
101
102 -F0 Flat table driven FSM.
103
104 -F1 Faster flat table-driven FSM.
105
106 -G0 Goto-driven FSM.
107
108 -G1 Faster goto-driven FSM.
109
110 -G2 Really fast goto-driven FSM.
111
112 -P<N> N-Way Split really fast goto-driven FSM.
113
114
116 NOTE: This is a very brief description of Ragel input. Ragel is
117 described in more detail in the user guide available from the homepage
118 (see below).
119
120 Ragel normally passes input files straight to the output. When it sees
121 an FSM specification that contains machine instantiations it stops to
122 generate the state machine. If there are write statements (such as
123 "write exec") then ragel emits the corresponding code. There can be any
124 number of FSM specifications in an input file. A multi-line FSM speci‐
125 fication starts with '%%{' and ends with '}%%'. A single line FSM spec‐
126 ification starts with %% and ends at the first newline.
127
129 Machine Name:
130 Set the the name of the machine. If given, it must be the first
131 statement.
132
133 Alphabet Type:
134 Set the data type of the alphabet.
135
136 GetKey:
137 Specify how to retrieve the alphabet character from the element
138 type.
139
140 Include:
141 Include a machine of same name as the current or of a different
142 name in either the current file or some other file.
143
144 Action Definition:
145 Define an action that can be invoked by the FSM.
146
147 Fsm Definition, Instantiation and Longest Match Instantiation:
148 Used to build FSMs. Syntax description in next few sections.
149
150 Access:
151 Specify how to access the persistent state machine variables.
152
153 Write: Write some component of the machine.
154
155 Variable:
156 Override the default variable names (p, pe, cs, act, etc).
157
159 The basic machines are the base operands of the regular language
160 expressions.
161
162 'hello'
163 Concat literal. Produces a concatenation of the characters in
164 the string. Supports escape sequences with '\'. The result
165 will have a start state and a transition to a new state for each
166 character in the string. The last state in the sequence will be
167 made final. To make the string case-insensitive, append an 'i'
168 to the string, as in 'cmd'i.
169
170 "hello"
171 Identical to single quote version.
172
173 [hello]
174 Or literal. Produces a union of characters. Supports character
175 ranges with '-', negating the sense of the union with an initial
176 '^' and escape sequences with '\'. The result will have two
177 states with a transition between them for each character or
178 range.
179
180 NOTE: '', "", and [] produce null FSMs. Null machines have one state
181 that is both a start state and a final state and match the zero length
182 string. A null machine may be created with the null builtin machine.
183
184 integer
185 Makes a two state machine with one transition on the given inte‐
186 ger number.
187
188 hex Makes a two state machine with one transition on the given hex‐
189 idecimal number.
190
191 /simple_regex/
192 A simple regular expression. Supports the notation '.', '*' and
193 '[]', character ranges with '-', negating the sense of an OR
194 expression with and initial '^' and escape sequences with '\'.
195 Also supports one trailing flag: i. Use it to produce a case-
196 insensitive regular expression, as in /GET/i.
197
198 lit .. lit
199 Specifies a range. The allowable upper and lower bounds are con‐
200 cat literals of length one and number machines. For example,
201 0x10..0x20, 0..63, and 'a'..'z' are valid ranges.
202
203 variable_name
204 References the machine definition assigned to the variable name
205 given.
206
207 builtin_machine
208 There are several builtin machines available. They are all two
209 state machines for the purpose of matching common classes of
210 characters. They are:
211
212 any Any character in the alphabet.
213
214 ascii Ascii characters 0..127.
215
216 extend Ascii extended characters. This is the range -128..127
217 for signed alphabets and the range 0..255 for unsigned
218 alphabets.
219
220 alpha Alphabetic characters /[A-Za-z]/.
221
222 digit Digits /[0-9]/.
223
224 alnum Alpha numerics /[0-9A-Za-z]/.
225
226 lower Lowercase characters /[a-z]/.
227
228 upper Uppercase characters /[A-Z]/.
229
230 xdigit Hexidecimal digits /[0-9A-Fa-f]/.
231
232 cntrl Control characters 0..31.
233
234 graph Graphical characters /[!-~]/.
235
236 print Printable characters /[ -~]/.
237
238 punct Punctuation. Graphical characters that are not alpha-
239 numerics /[!-/:-@\[-`{-~]/.
240
241 space Whitespace /[\t\v\f\n\r ]/.
242
243 null Zero length string. Equivalent to '', "" and [].
244
245 empty Empty set. Matches nothing.
246
248 Operators are grouped by precedence, group 1 being the lowest and group
249 6 the highest.
250
251 GROUP 1:
252
253 expr , expr
254 Join machines together without drawing any transitions, setting
255 up a start state or any final states. Start state must be
256 explicitly specified with the "start" label. Final states may be
257 specified with the an epsilon transitions to the implicitly cre‐
258 ated "final" state.
259
260 GROUP 2:
261
262 expr | expr
263 Produces a machine that matches any string in machine one or
264 machine two.
265
266 expr & expr
267 Produces a machine that matches any string that is in both
268 machine one and machine two.
269
270 expr - expr
271 Produces a machine that matches any string that is in machine
272 one but not in machine two.
273
274 expr -- expr
275 Strong Subtraction. Matches any string in machine one that does
276 not have any string in machine two as a substring.
277
278 GROUP 3:
279
280 expr . expr
281 Produces a machine that matches all the strings in machine one
282 followed by all the strings in machine two.
283
284 expr :> expr
285 Entry-Guarded Concatenation: terminates machine one upon entry
286 to machine two.
287
288 expr :>> expr
289 Finish-Guarded Concatenation: terminates machine one when
290 machine two finishes.
291
292 expr <: expr
293 Left-Guarded Concatenation: gives a higher priority to machine
294 one.
295
296 NOTE: Concatenation is the default operator. Two machines next to each
297 other with no operator between them results in the concatenation opera‐
298 tion.
299
300 GROUP 4:
301
302 label: expr
303 Attaches a label to an expression. Labels can be used by epsilon
304 transitions and fgoto and fcall statements in actions. Also note
305 that the referencing of a machine definition causes the implicit
306 creation of label by the same name.
307
308 GROUP 5:
309
310 expr -> label
311 Draws an epsilon transition to the state defined by label. Label
312 must be a name in the current scope. Epsilon transitions are
313 resolved when comma operators are evaluated and at the root of
314 the expression tree of machine assignment/instantiation.
315
316 GROUP 6: Actions
317
318 An action may be a name predefined with an action statement or may be
319 specified directly with '{' and '}' in the expression.
320
321 expr > action
322 Embeds action into starting transitions.
323
324 expr @ action
325 Embeds action into transitions that go into a final state.
326
327 expr $ action
328 Embeds action into all transitions. Does not include pending out
329 transitions.
330
331 expr % action
332 Embeds action into pending out transitions from final states.
333
334 GROUP 6: EOF Actions
335
336 When a machine's finish routine is called the current state's EOF
337 actions are executed.
338
339 expr >/ action
340 Embed an EOF action into the start state.
341
342 expr </ action
343 Embed an EOF action into all states except the start state.
344
345 expr $/ action
346 Embed an EOF action into all states.
347
348 expr %/ action
349 Embed an EOF action into final states.
350
351 expr @/ action
352 Embed an EOF action into all states that are not final.
353
354 expr <>/ action
355 Embed an EOF action into all states that are not the start state
356 and that are not final (middle states).
357
358 GROUP 6: Global Error Actions
359
360 Global error actions are stored in states until the final state machine
361 has been fully constructed. They are then transferred to error transi‐
362 tions, giving the effect of a default action.
363
364 expr >! action
365 Embed a global error action into the start state.
366
367 expr <! action
368 Embed a global error action into all states except the start
369 state.
370
371 expr $! action
372 Embed a global error action into all states.
373
374 expr %! action
375 Embed a global error action into the final states.
376
377 expr @! action
378 Embed a global error action into all states which are not final.
379
380 expr <>! action
381 Embed a global error action into all states which are not the
382 start state and are not final (middle states).
383
384 GROUP 6: Local Error Actions
385
386 Local error actions are stored in states until the named machine is
387 fully constructed. They are then transferred to error transitions, giv‐
388 ing the effect of a default action for a section of the total machine.
389 Note that the name may be omitted, in which case the action will be
390 transferred to error actions upon construction of the current machine.
391
392 expr >^ action
393 Embed a local error action into the start state.
394
395 expr <^ action
396 Embed a local error action into all states except the start
397 state.
398
399 expr $^ action
400 Embed a local error action into all states.
401
402 expr %^ action
403 Embed a local error action into the final states.
404
405 expr @^ action
406 Embed a local error action into all states which are not final.
407
408 expr <>^ action
409 Embed a local error action into all states which are not the
410 start state and are not final (middle states).
411
412 GROUP 6: To-State Actions
413
414 To state actions are stored in states and executed any time the machine
415 moves into a state. This includes regular transitions, and transfers of
416 control such as fgoto. Note that setting the current state from outside
417 the machine (for example during initialization) does not count as a
418 transition into a state.
419
420 expr >~ action
421 Embed a to-state action action into the start state.
422
423 expr <~ action
424 Embed a to-state action into all states except the start state.
425
426 expr $~ action
427 Embed a to-state action into all states.
428
429 expr %~ action
430 Embed a to-state action into the final states.
431
432 expr @~ action
433 Embed a to-state action into all states which are not final.
434
435 expr <>~ action
436 Embed a to-state action into all states which are not the start
437 state and are not final (middle states).
438
439 GROUP 6: From-State Actions
440
441 From state actions are executed whenever a state takes a transition on
442 a character. This includes the error transition and a transition to
443 self.
444
445 expr >* action
446 Embed a from-state action into the start state.
447
448 expr <* action
449 Embed a from-state action into every state except the start
450 state.
451
452 expr $* action
453 Embed a from-state action into all states.
454
455 expr %* action
456 Embed a from-state action into the final states.
457
458 expr @* action
459 Embed a from-state action into all states which are not final.
460
461 expr <>* action
462 Embed a from-state action into all states which are not the
463 start state and are not final (middle states).
464
465 GROUP 6: Priority Assignment
466
467 Priorities are assigned to names within transitions. Only priorities on
468 the same name are allowed to interact. In the first form of priorities
469 the name defaults to the name of the machine definition the priority is
470 assigned in. Transitions do not have default priorities.
471
472 expr > int
473 Assigns the priority int in all transitions leaving the start
474 state.
475
476 expr @ int
477 Assigns the priority int in all transitions that go into a final
478 state.
479
480 expr $ int
481 Assigns the priority int in all existing transitions.
482
483 expr % int
484 Assigns the priority int in all pending out transitions.
485
486 A second form of priority assignment allows the programmer to specify
487 the name to which the priority is assigned, allowing interactions to
488 cross machine definition boundaries.
489
490 expr > (name,int)
491 Assigns the priority int to name in all transitions leaving the
492 start state.
493
494 expr @ (name, int)
495 Assigns the priority int to name in all transitions that go into
496 a final state.
497
498 expr $ (name, int)
499 Assigns the priority int to name in all existing transitions.
500
501 expr % (name, int)
502 Assigns the priority int to name in all pending out transitions.
503
504 GROUP 7:
505
506 expr * Produces the kleene star of a machine. Matches zero or more rep‐
507 etitions of the machine.
508
509 expr **
510 Longest-Match Kleene Star. This version of kleene star puts a
511 higher priority on staying in the machine over wrapping around
512 and starting over. This operator is equivalent to ( ( expr ) $0
513 %1 )*.
514
515 expr ? Produces a machine that accepts the machine given or the null
516 string. This operator is equivalent to ( expr | '' ).
517
518 expr + Produces the machine concatenated with the kleen star of itself.
519 Matches one or more repetitions of the machine. This operator
520 is equivalent to ( expr . expr* ).
521
522 expr {n}
523 Produces a machine that matches exactly n repetitions of expr.
524
525 expr {,n}
526 Produces a machine that matches anywhere from zero to n repeti‐
527 tions of expr.
528
529 expr {n,}
530 Produces a machine that matches n or more repetitions of expr.
531
532 expr {n,m}
533 Produces a machine that matches n to m repetitions of expr.
534
535 GROUP 8:
536
537 ! expr Produces a machine that matches any string not matched by the
538 given machine. This operator is equivalent to ( *extend - expr
539 ).
540
541 ^ expr Character-Level Negation. Matches any single character not
542 matched by the single character machine expr.
543
544 GROUP 9:
545
546 ( expr )
547 Forces precedence on operators.
548
550 fc The current character. Equivalent to *p.
551
552 fpc A pointer to the current character. Equivalent to p.
553
554 fcurs An integer value representing the current state.
555
556 ftargs An integer value representing the target state.
557
558 fentry(<label>)
559 An integer value representing the entry point <label>.
560
562 fhold; Do not advance over the current character. Equivalent to --p;.
563
564 fexec <expr>;
565 Sets the current character to something else. Equivalent to p =
566 (<expr>)-1;
567
568 fgoto <label>;
569 Jump to the machine defined by <label>.
570
571 fgoto *<expr>;
572 Jump to the entry point given by <expr>. The expression must
573 evaluate to an integer value representing a state.
574
575 fnext <label>;
576 Set the next state to be the entry point defined by <label>.
577 The fnext statement does not immediately jump to the specified
578 state. Any action code following the statement is executed.
579
580 fnext *<expr>;
581 Set the next state to be the entry point given by <expr>. The
582 expression must evaluate to an integer value representing a
583 state.
584
585 fcall <label>;
586 Call the machine defined by <label>. The next fret will jump to
587 the target of the transition on which the action is invoked.
588
589 fcall *<expr>;
590 Call the entry point given by <expr>. The next fret will jump to
591 the target of the transition on which the action is invoked.
592
593 fret; Return to the target state of the transition on which the last
594 fcall was made.
595
596 fbreak;
597 Save the current state and immediately break out of the machine.
598
600 Ragel is still under development and has not yet matured. There are
601 probably many bugs.
602
604 Ragel was written by Adrian Thurston <thurston@cs.queensu.ca>. Objec‐
605 tive-C output contributed by Erich Ocean. D output contributed by Alan
606 West.
607
609 rlgen-cd(1), rlgen-dot(1), rlgen-java(1), re2c(1), flex(1)
610
611 Homepage: http://www.cs.queensu.ca/~thurston/ragel/
612
613
614
615Ragel 6.2 March 2008 RAGEL(1)