1leex(3) Erlang Module Definition leex(3)
2
3
4
6 leex - Lexical analyzer generator for Erlang
7
9 A regular expression based lexical analyzer generator for Erlang, simi‐
10 lar to lex or flex.
11
12 Note:
13 The Leex module should be considered experimental as it will be subject
14 to changes in future releases.
15
16
18 ErrorInfo = {ErrorLine,module(),error_descriptor()}
19 ErrorLine = integer()
20 Token = tuple()
21
23 file(FileName) -> LeexRet
24 file(FileName, Options) -> LeexRet
25
26 Types:
27
28 FileName = filename()
29 Options = Option | [Option]
30 Option = - see below -
31 LeexRet = {ok, Scannerfile} | {ok, Scannerfile, Warnings} |
32 error | {error, Errors, Warnings}
33 Scannerfile = filename()
34 Warnings = Errors = [{filename(), [ErrorInfo]}]
35 ErrorInfo = {ErrorLine, module(), Reason}
36 ErrorLine = integer()
37 Reason = - formatable by format_error/1 -
38
39 Generates a lexical analyzer from the definition in the input
40 file. The input file has the extension .xrl. This is added to
41 the filename if it is not given. The resulting module is the Xrl
42 filename without the .xrl extension.
43
44 The current options are:
45
46 dfa_graph:
47 Generates a .dot file which contains a description of the
48 DFA in a format which can be viewed with Graphviz,
49 www.graphviz.com.
50
51 {includefile,Includefile}:
52 Uses a specific or customised prologue file instead of
53 default lib/parsetools/include/leexinc.hrl which is other‐
54 wise included.
55
56 {report_errors, bool()}:
57 Causes errors to be printed as they occur. Default is true.
58
59 {report_warnings, bool()}:
60 Causes warnings to be printed as they occur. Default is
61 true.
62
63 warnings_as_errors:
64 Causes warnings to be treated as errors.
65
66 {report, bool()}:
67 This is a short form for both report_errors and report_warn‐
68 ings.
69
70 {return_errors, bool()}:
71 If this flag is set, {error, Errors, Warnings} is returned
72 when there are errors. Default is false.
73
74 {return_warnings, bool()}:
75 If this flag is set, an extra field containing Warnings is
76 added to the tuple returned upon success. Default is false.
77
78 {return, bool()}:
79 This is a short form for both return_errors and return_warn‐
80 ings.
81
82 {scannerfile, Scannerfile}:
83 Scannerfile is the name of the file that will contain the
84 Erlang scanner code that is generated. The default ("") is
85 to add the extension .erl to FileName stripped of the .xrl
86 extension.
87
88 {verbose, bool()}:
89 Outputs information from parsing the input file and generat‐
90 ing the internal tables.
91
92 Any of the Boolean options can be set to true by stating the
93 name of the option. For example, verbose is equivalent to {ver‐
94 bose, true}.
95
96 Leex will add the extension .hrl to the Includefile name and the
97 extension .erl to the Scannerfile name, unless the extension is
98 already there.
99
100 format_error(ErrorInfo) -> Chars
101
102 Types:
103
104 Chars = [char() | Chars]
105
106 Returns a string which describes the error ErrorInfo returned
107 when there is an error in a regular expression.
108
110 The following functions are exported by the generated scanner.
111
113 Module:string(String) -> StringRet
114 Module:string(String, StartLine) -> StringRet
115
116 Types:
117
118 String = string()
119 StringRet = {ok,Tokens,EndLine} | ErrorInfo
120 Tokens = [Token]
121 EndLine = StartLine = integer()
122
123 Scans String and returns all the tokens in it, or an error.
124
125 Note:
126 It is an error if not all of the characters in String are con‐
127 sumed.
128
129
130 Module:token(Cont, Chars) -> {more,Cont1} | {done,TokenRet,RestChars}
131 Module:token(Cont, Chars, StartLine) -> {more,Cont1} | {done,Token‐
132 Ret,RestChars}
133
134 Types:
135
136 Cont = [] | Cont1
137 Cont1 = tuple()
138 Chars = RestChars = string() | eof
139 TokenRet = {ok, Token, EndLine} | {eof, EndLine} | ErrorInfo
140 StartLine = EndLine = integer()
141
142 This is a re-entrant call to try and scan one token from Chars.
143 If there are enough characters in Chars to either scan a token
144 or detect an error then this will be returned with {done,...}.
145 Otherwise {cont,Cont} will be returned where Cont is used in the
146 next call to token() with more characters to try an scan the
147 token. This is continued until a token has been scanned. Cont is
148 initially [].
149
150 It is not designed to be called directly by an application but
151 used through the i/o system where it can typically be called in
152 an application by:
153
154 io:request(InFile, {get_until,unicode,Prompt,Module,token,[Line]})
155 -> TokenRet
156
157 Module:tokens(Cont, Chars) -> {more,Cont1} | {done,TokensRet,RestChars}
158 Module:tokens(Cont, Chars, StartLine) -> {more,Cont1} | {done,Token‐
159 sRet,RestChars}
160
161 Types:
162
163 Cont = [] | Cont1
164 Cont1 = tuple()
165 Chars = RestChars = string() | eof
166 TokensRet = {ok, Tokens, EndLine} | {eof, EndLine} | Error‐
167 Info
168 Tokens = [Token]
169 StartLine = EndLine = integer()
170
171 This is a re-entrant call to try and scan tokens from Chars. If
172 there are enough characters in Chars to either scan tokens or
173 detect an error then this will be returned with {done,...}. Oth‐
174 erwise {cont,Cont} will be returned where Cont is used in the
175 next call to tokens() with more characters to try an scan the
176 tokens. This is continued until all tokens have been scanned.
177 Cont is initially [].
178
179 This functions differs from token in that it will continue to
180 scan tokens upto and including an {end_token,Token} has been
181 scanned (see next section). It will then return all the tokens.
182 This is typically used for scanning grammars like Erlang where
183 there is an explicit end token, '.'. If no end token is found
184 then the whole file will be scanned and returned. If an error
185 occurs then all tokens upto and including the next end token
186 will be skipped.
187
188 It is not designed to be called directly by an application but
189 used through the i/o system where it can typically be called in
190 an application by:
191
192 io:request(InFile, {get_until,unicode,Prompt,Module,tokens,[Line]})
193 -> TokensRet
194
196 Erlang style comments starting with a % are allowed in scanner files. A
197 definition file has the following format:
198
199 <Header>
200
201 Definitions.
202
203 <Macro Definitions>
204
205 Rules.
206
207 <Token Rules>
208
209 Erlang code.
210
211 <Erlang code>
212
213 The "Definitions.", "Rules." and "Erlang code." headings are mandatory
214 and must occur at the beginning of a source line. The <Header>, <Macro
215 Definitions> and <Erlang code> sections may be empty but there must be
216 at least one rule.
217
218 Macro definitions have the following format:
219
220 NAME = VALUE
221
222 and there must be spaces around =. Macros can be used in the regular
223 expressions of rules by writing {NAME}.
224
225 Note:
226 When macros are expanded in expressions the macro calls are replaced by
227 the macro value without any form of quoting or enclosing in parenthe‐
228 ses.
229
230
231 Rules have the following format:
232
233 <Regexp> : <Erlang code>.
234
235 The <Regexp> must occur at the start of a line and not include any
236 blanks; use \t and \s to include TAB and SPACE characters in the regu‐
237 lar expression. If <Regexp> matches then the corresponding <Erlang
238 code> is evaluated to generate a token. With the Erlang code the fol‐
239 lowing predefined variables are available:
240
241 TokenChars:
242 A list of the characters in the matched token.
243
244 TokenLen:
245 The number of characters in the matched token.
246
247 TokenLine:
248 The line number where the token occurred.
249
250 The code must return:
251
252 {token,Token}:
253 Return Token to the caller.
254
255 {end_token,Token}:
256 Return Token and is last token in a tokens call.
257
258 skip_token:
259 Skip this token completely.
260
261 {error,ErrString}:
262 An error in the token, ErrString is a string describing the error.
263
264 It is also possible to push back characters into the input characters
265 with the following returns:
266
267 * {token,Token,PushBackList}
268
269 * {end_token,Token,PushBackList}
270
271 * {skip_token,PushBackList}
272
273 These have the same meanings as the normal returns but the characters
274 in PushBackList will be prepended to the input characters and scanned
275 for the next token. Note that pushing back a newline will mean the line
276 numbering will no longer be correct.
277
278 Note:
279 Pushing back characters gives you unexpected possibilities to cause the
280 scanner to loop!
281
282
283 The following example would match a simple Erlang integer or float and
284 return a token which could be sent to the Erlang parser:
285
286 D = [0-9]
287
288 {D}+ :
289 {token,{integer,TokenLine,list_to_integer(TokenChars)}}.
290
291 {D}+\.{D}+((E|e)(\+|\-)?{D}+)? :
292 {token,{float,TokenLine,list_to_float(TokenChars)}}.
293
294 The Erlang code in the "Erlang code." section is written into the out‐
295 put file directly after the module declaration and predefined exports
296 declaration so it is possible to add extra exports, define imports and
297 other attributes which are then visible in the whole file.
298
300 The regular expressions allowed here is a subset of the set found in
301 egrep and in the AWK programming language, as defined in the book, The
302 AWK Programming Language, by A. V. Aho, B. W. Kernighan, P. J. Wein‐
303 berger. They are composed of the following characters:
304
305 c:
306 Matches the non-metacharacter c.
307
308 \c:
309 Matches the escape sequence or literal character c.
310
311 .:
312 Matches any character.
313
314 ^:
315 Matches the beginning of a string.
316
317 $:
318 Matches the end of a string.
319
320 [abc...]:
321 Character class, which matches any of the characters abc.... Char‐
322 acter ranges are specified by a pair of characters separated by a
323 -.
324
325 [^abc...]:
326 Negated character class, which matches any character except abc....
327
328 r1 | r2:
329 Alternation. It matches either r1 or r2.
330
331 r1r2:
332 Concatenation. It matches r1 and then r2.
333
334 r+:
335 Matches one or more rs.
336
337 r*:
338 Matches zero or more rs.
339
340 r?:
341 Matches zero or one rs.
342
343 (r):
344 Grouping. It matches r.
345
346 The escape sequences allowed are the same as for Erlang strings:
347
348 \b:
349 Backspace.
350
351 \f:
352 Form feed.
353
354 \n:
355 Newline (line feed).
356
357 \r:
358 Carriage return.
359
360 \t:
361 Tab.
362
363 \e:
364 Escape.
365
366 \v:
367 Vertical tab.
368
369 \s:
370 Space.
371
372 \d:
373 Delete.
374
375 \ddd:
376 The octal value ddd.
377
378 \xhh:
379 The hexadecimal value hh.
380
381 \x{h...}:
382 The hexadecimal value h....
383
384 \c:
385 Any other character literally, for example \\ for backslash, \" for
386 ".
387
388 The following examples define simplified versions of a few Erlang data
389 types:
390
391 Atoms [a-z][0-9a-zA-Z_]*
392
393 Variables [A-Z_][0-9a-zA-Z_]*
394
395 Floats (\+|-)?[0-9]+\.[0-9]+((E|e)(\+|-)?[0-9]+)?
396
397 Note:
398 Anchoring a regular expression with ^ and $ is not implemented in the
399 current version of Leex and just generates a parse error.
400
401
402
403Ericsson AB parsetools 2.2 leex(3)