1leex(3) Erlang Module Definition leex(3)
2
3
4
6 leex - Lexical analyzer generator for Erlang
7
9 A regular expression based lexical analyzer generator for Erlang, simi‐
10 lar to lex or flex.
11
12 Note:
13 The Leex module should be considered experimental as it will be subject
14 to changes in future releases.
15
16
18 ErrorInfo = {ErrorLine,module(),error_descriptor()}
19 ErrorLine = integer()
20 Token = tuple()
21
23 file(FileName, [, Options]) -> LeexRet
24
25 Types:
26
27 FileName = filename()
28 Options = Option | [Option]
29 Option = - see below -
30 LeexRet = {ok, Scannerfile} | {ok, Scannerfile, Warnings} |
31 error | {error, Errors, Warnings}
32 Scannerfile = filename()
33 Warnings = Errors = [{filename(), [ErrorInfo]}]
34 ErrorInfo = {ErrorLine, module(), Reason}
35 ErrorLine = integer()
36 Reason = - formatable by format_error/1 -
37
38 Generates a lexical analyzer from the definition in the input
39 file. The input file has the extension .xrl. This is added to
40 the filename if it is not given. The resulting module is the Xrl
41 filename without the .xrl extension.
42
43 The current options are:
44
45 dfa_graph:
46 Generates a .dot file which contains a description of the
47 DFA in a format which can be viewed with Graphviz,
48 www.graphviz.com.
49
50 {includefile,Includefile}:
51 Uses a specific or customised prologue file instead of
52 default lib/parsetools/include/leexinc.hrl which is other‐
53 wise included.
54
55 {report_errors, bool()}:
56 Causes errors to be printed as they occur. Default is true.
57
58 {report_warnings, bool()}:
59 Causes warnings to be printed as they occur. Default is
60 true.
61
62 warnings_as_errors:
63 Causes warnings to be treated as errors.
64
65 {report, bool()}:
66 This is a short form for both report_errors and report_warn‐
67 ings.
68
69 {return_errors, bool()}:
70 If this flag is set, {error, Errors, Warnings} is returned
71 when there are errors. Default is false.
72
73 {return_warnings, bool()}:
74 If this flag is set, an extra field containing Warnings is
75 added to the tuple returned upon success. Default is false.
76
77 {return, bool()}:
78 This is a short form for both return_errors and return_warn‐
79 ings.
80
81 {scannerfile, Scannerfile}:
82 Scannerfile is the name of the file that will contain the
83 Erlang scanner code that is generated. The default ("") is
84 to add the extension .erl to FileName stripped of the .xrl
85 extension.
86
87 {verbose, bool()}:
88 Outputs information from parsing the input file and generat‐
89 ing the internal tables.
90
91 Any of the Boolean options can be set to true by stating the
92 name of the option. For example, verbose is equivalent to {ver‐
93 bose, true}.
94
95 Leex will add the extension .hrl to the Includefile name and the
96 extension .erl to the Scannerfile name, unless the extension is
97 already there.
98
99 format_error(ErrorInfo) -> Chars
100
101 Types:
102
103 Chars = [char() | Chars]
104
105 Returns a string which describes the error ErrorInfo returned
106 when there is an error in a regular expression.
107
109 The following functions are exported by the generated scanner.
110
112 string(String) -> StringRet
113 string(String, StartLine) -> StringRet
114
115 Types:
116
117 String = string()
118 StringRet = {ok,Tokens,EndLine} | ErrorInfo
119 Tokens = [Token]
120 EndLine = StartLine = integer()
121
122 Scans String and returns all the tokens in it, or an error.
123
124 Note:
125 It is an error if not all of the characters in String are con‐
126 sumed.
127
128
129 token(Cont, Chars) -> {more,Cont1} | {done,TokenRet,RestChars}
130 token(Cont, Chars, StartLine) -> {more,Cont1} | {done,Token‐
131 Ret,RestChars}
132
133 Types:
134
135 Cont = [] | Cont1
136 Cont1 = tuple()
137 Chars = RestChars = string() | eof
138 TokenRet = {ok, Token, EndLine} | {eof, EndLine} | ErrorInfo
139 StartLine = EndLine = integer()
140
141 This is a re-entrant call to try and scan one token from Chars.
142 If there are enough characters in Chars to either scan a token
143 or detect an error then this will be returned with {done,...}.
144 Otherwise {cont,Cont} will be returned where Cont is used in the
145 next call to token() with more characters to try an scan the
146 token. This is continued until a token has been scanned. Cont is
147 initially [].
148
149 It is not designed to be called directly by an application but
150 used through the i/o system where it can typically be called in
151 an application by:
152
153 io:request(InFile, {get_until,Prompt,Module,token,[Line]})
154 -> TokenRet
155
156 tokens(Cont, Chars) -> {more,Cont1} | {done,TokensRet,RestChars}
157 tokens(Cont, Chars, StartLine) -> {more,Cont1} | {done,Token‐
158 sRet,RestChars}
159
160 Types:
161
162 Cont = [] | Cont1
163 Cont1 = tuple()
164 Chars = RestChars = string() | eof
165 TokensRet = {ok, Tokens, EndLine} | {eof, EndLine} | Error‐
166 Info
167 Tokens = [Token]
168 StartLine = EndLine = integer()
169
170 This is a re-entrant call to try and scan tokens from Chars. If
171 there are enough characters in Chars to either scan tokens or
172 detect an error then this will be returned with {done,...}. Oth‐
173 erwise {cont,Cont} will be returned where Cont is used in the
174 next call to tokens() with more characters to try an scan the
175 tokens. This is continued until all tokens have been scanned.
176 Cont is initially [].
177
178 This functions differs from token in that it will continue to
179 scan tokens upto and including an {end_token,Token} has been
180 scanned (see next section). It will then return all the tokens.
181 This is typically used for scanning grammars like Erlang where
182 there is an explicit end token, '.'. If no end token is found
183 then the whole file will be scanned and returned. If an error
184 occurs then all tokens upto and including the next end token
185 will be skipped.
186
187 It is not designed to be called directly by an application but
188 used through the i/o system where it can typically be called in
189 an application by:
190
191 io:request(InFile, {get_until,Prompt,Module,tokens,[Line]})
192 -> TokensRet
193
195 Erlang style comments starting with a % are allowed in scanner files. A
196 definition file has the following format:
197
198 <Header>
199
200 Definitions.
201
202 <Macro Definitions>
203
204 Rules.
205
206 <Token Rules>
207
208 Erlang code.
209
210 <Erlang code>
211
212 The "Definitions.", "Rules." and "Erlang code." headings are mandatory
213 and must occur at the beginning of a source line. The <Header>, <Macro
214 Definitions> and <Erlang code> sections may be empty but there must be
215 at least one rule.
216
217 Macro definitions have the following format:
218
219 NAME = VALUE
220
221 and there must be spaces around =. Macros can be used in the regular
222 expressions of rules by writing {NAME}.
223
224 Note:
225 When macros are expanded in expressions the macro calls are replaced by
226 the macro value without any form of quoting or enclosing in parenthe‐
227 ses.
228
229
230 Rules have the following format:
231
232 <Regexp> : <Erlang code>.
233
234 The <Regexp> must occur at the start of a line and not include any
235 blanks; use \t and \s to include TAB and SPACE characters in the regu‐
236 lar expression. If <Regexp> matches then the corresponding <Erlang
237 code> is evaluated to generate a token. With the Erlang code the fol‐
238 lowing predefined variables are available:
239
240 TokenChars:
241 A list of the characters in the matched token.
242
243 TokenLen:
244 The number of characters in the matched token.
245
246 TokenLine:
247 The line number where the token occurred.
248
249 The code must return:
250
251 {token,Token}:
252 Return Token to the caller.
253
254 {end_token,Token}:
255 Return Token and is last token in a tokens call.
256
257 skip_token:
258 Skip this token completely.
259
260 {error,ErrString}:
261 An error in the token, ErrString is a string describing the error.
262
263 It is also possible to push back characters into the input characters
264 with the following returns:
265
266 * {token,Token,PushBackList}
267
268 * {end_token,Token,PushBackList}
269
270 * {skip_token,PushBackList}
271
272 These have the same meanings as the normal returns but the characters
273 in PushBackList will be prepended to the input characters and scanned
274 for the next token. Note that pushing back a newline will mean the line
275 numbering will no longer be correct.
276
277 Note:
278 Pushing back characters gives you unexpected possibilities to cause the
279 scanner to loop!
280
281
282 The following example would match a simple Erlang integer or float and
283 return a token which could be sent to the Erlang parser:
284
285 D = [0-9]
286
287 {D}+ :
288 {token,{integer,TokenLine,list_to_integer(TokenChars)}}.
289
290 {D}+\.{D}+((E|e)(\+|\-)?{D}+)? :
291 {token,{float,TokenLine,list_to_float(TokenChars)}}.
292
293 The Erlang code in the "Erlang code." section is written into the out‐
294 put file directly after the module declaration and predefined exports
295 declaration so it is possible to add extra exports, define imports and
296 other attributes which are then visible in the whole file.
297
299 The regular expressions allowed here is a subset of the set found in
300 egrep and in the AWK programming language, as defined in the book, The
301 AWK Programming Language, by A. V. Aho, B. W. Kernighan, P. J. Wein‐
302 berger. They are composed of the following characters:
303
304 c:
305 Matches the non-metacharacter c.
306
307 \c:
308 Matches the escape sequence or literal character c.
309
310 .:
311 Matches any character.
312
313 ^:
314 Matches the beginning of a string.
315
316 $:
317 Matches the end of a string.
318
319 [abc...]:
320 Character class, which matches any of the characters abc.... Char‐
321 acter ranges are specified by a pair of characters separated by a
322 -.
323
324 [^abc...]:
325 Negated character class, which matches any character except abc....
326
327 r1 | r2:
328 Alternation. It matches either r1 or r2.
329
330 r1r2:
331 Concatenation. It matches r1 and then r2.
332
333 r+:
334 Matches one or more rs.
335
336 r*:
337 Matches zero or more rs.
338
339 r?:
340 Matches zero or one rs.
341
342 (r):
343 Grouping. It matches r.
344
345 The escape sequences allowed are the same as for Erlang strings:
346
347 \b:
348 Backspace.
349
350 \f:
351 Form feed.
352
353 \n:
354 Newline (line feed).
355
356 \r:
357 Carriage return.
358
359 \t:
360 Tab.
361
362 \e:
363 Escape.
364
365 \v:
366 Vertical tab.
367
368 \s:
369 Space.
370
371 \d:
372 Delete.
373
374 \ddd:
375 The octal value ddd.
376
377 \xhh:
378 The hexadecimal value hh.
379
380 \x{h...}:
381 The hexadecimal value h....
382
383 \c:
384 Any other character literally, for example \\ for backslash, \" for
385 ".
386
387 The following examples define simplified versions of a few Erlang data
388 types:
389
390 Atoms [a-z][0-9a-zA-Z_]*
391
392 Variables [A-Z_][0-9a-zA-Z_]*
393
394 Floats (\+|-)?[0-9]+\.[0-9]+((E|e)(\+|-)?[0-9]+)?
395
396 Note:
397 Anchoring a regular expression with ^ and $ is not implemented in the
398 current version of Leex and just generates a parse error.
399
400
401
402Ericsson AB parsetools 2.1.6 leex(3)