1leex(3)                    Erlang Module Definition                    leex(3)
2
3
4

NAME

6       leex - Lexical analyzer generator for Erlang
7

DESCRIPTION

9       A regular expression based lexical analyzer generator for Erlang, simi‐
10       lar to lex or flex.
11
12   Note:
13       The Leex module should be considered experimental as it will be subject
14       to changes in future releases.
15
16

DATA TYPES

18       ErrorInfo = {ErrorLine,module(),error_descriptor()}
19       ErrorLine = integer()
20       Token = tuple()
21

EXPORTS

23       file(FileName, [, Options]) -> LeexRet
24
25              Types:
26
27                 FileName = filename()
28                 Options = Option | [Option]
29                 Option = - see below -
30                 LeexRet  =  {ok, Scannerfile} | {ok, Scannerfile, Warnings} |
31                 error | {error, Errors, Warnings}
32                 Scannerfile = filename()
33                 Warnings = Errors = [{filename(), [ErrorInfo]}]
34                 ErrorInfo = {ErrorLine, module(), Reason}
35                 ErrorLine = integer()
36                 Reason = - formatable by format_error/1 -
37
38              Generates a lexical analyzer from the definition  in  the  input
39              file.  The  input  file has the extension .xrl. This is added to
40              the filename if it is not given. The resulting module is the Xrl
41              filename without the .xrl extension.
42
43              The current options are:
44
45                dfa_graph:
46                  Generates  a  .dot  file which contains a description of the
47                  DFA  in  a  format  which  can  be  viewed  with   Graphviz,
48                  www.graphviz.com.
49
50                {includefile,Includefile}:
51                  Uses  a  specific  or  customised  prologue  file instead of
52                  default lib/parsetools/include/leexinc.hrl which  is  other‐
53                  wise included.
54
55                {report_errors, bool()}:
56                  Causes errors to be printed as they occur. Default is true.
57
58                {report_warnings, bool()}:
59                  Causes  warnings  to  be  printed  as they occur. Default is
60                  true.
61
62                warnings_as_errors:
63                  Causes warnings to be treated as errors.
64
65                {report, bool()}:
66                  This is a short form for both report_errors and report_warn‐
67                  ings.
68
69                {return_errors, bool()}:
70                  If  this  flag is set, {error, Errors, Warnings} is returned
71                  when there are errors. Default is false.
72
73                {return_warnings, bool()}:
74                  If this flag is set, an extra field containing  Warnings  is
75                  added to the tuple returned upon success. Default is false.
76
77                {return, bool()}:
78                  This is a short form for both return_errors and return_warn‐
79                  ings.
80
81                {scannerfile, Scannerfile}:
82                  Scannerfile is the name of the file that  will  contain  the
83                  Erlang  scanner  code that is generated. The default ("") is
84                  to add the extension .erl to FileName stripped of  the  .xrl
85                  extension.
86
87                {verbose, bool()}:
88                  Outputs information from parsing the input file and generat‐
89                  ing the internal tables.
90
91              Any of the Boolean options can be set to  true  by  stating  the
92              name  of the option. For example, verbose is equivalent to {ver‐
93              bose, true}.
94
95              Leex will add the extension .hrl to the Includefile name and the
96              extension  .erl to the Scannerfile name, unless the extension is
97              already there.
98
99       format_error(ErrorInfo) -> Chars
100
101              Types:
102
103                 Chars = [char() | Chars]
104
105              Returns a string which describes the  error  ErrorInfo  returned
106              when there is an error in a regular expression.
107

GENERATED SCANNER EXPORTS

109       The following functions are exported by the generated scanner.
110

EXPORTS

112       string(String) -> StringRet
113       string(String, StartLine) -> StringRet
114
115              Types:
116
117                 String = string()
118                 StringRet = {ok,Tokens,EndLine} | ErrorInfo
119                 Tokens = [Token]
120                 EndLine = StartLine = integer()
121
122              Scans String and returns all the tokens in it, or an error.
123
124          Note:
125              It  is  an error if not all of the characters in String are con‐
126              sumed.
127
128
129       token(Cont, Chars) -> {more,Cont1} | {done,TokenRet,RestChars}
130       token(Cont,  Chars,   StartLine)   ->   {more,Cont1}   |   {done,Token‐
131       Ret,RestChars}
132
133              Types:
134
135                 Cont = [] | Cont1
136                 Cont1 = tuple()
137                 Chars = RestChars = string() | eof
138                 TokenRet = {ok, Token, EndLine} | {eof, EndLine} | ErrorInfo
139                 StartLine = EndLine = integer()
140
141              This  is a re-entrant call to try and scan one token from Chars.
142              If there are enough characters in Chars to either scan  a  token
143              or  detect  an error then this will be returned with {done,...}.
144              Otherwise {cont,Cont} will be returned where Cont is used in the
145              next  call  to  token()  with more characters to try an scan the
146              token. This is continued until a token has been scanned. Cont is
147              initially [].
148
149              It  is  not designed to be called directly by an application but
150              used through the i/o system where it can typically be called  in
151              an application by:
152
153              io:request(InFile, {get_until,Prompt,Module,token,[Line]})
154                -> TokenRet
155
156       tokens(Cont, Chars) -> {more,Cont1} | {done,TokensRet,RestChars}
157       tokens(Cont,   Chars,   StartLine)   ->   {more,Cont1}  |  {done,Token‐
158       sRet,RestChars}
159
160              Types:
161
162                 Cont = [] | Cont1
163                 Cont1 = tuple()
164                 Chars = RestChars = string() | eof
165                 TokensRet = {ok, Tokens, EndLine} | {eof, EndLine}  |  Error‐
166                 Info
167                 Tokens = [Token]
168                 StartLine = EndLine = integer()
169
170              This  is a re-entrant call to try and scan tokens from Chars. If
171              there are enough characters in Chars to either  scan  tokens  or
172              detect an error then this will be returned with {done,...}. Oth‐
173              erwise {cont,Cont} will be returned where Cont is  used  in  the
174              next  call  to  tokens() with more characters to try an scan the
175              tokens. This is continued until all tokens  have  been  scanned.
176              Cont is initially [].
177
178              This  functions  differs  from token in that it will continue to
179              scan tokens upto and including  an  {end_token,Token}  has  been
180              scanned  (see next section). It will then return all the tokens.
181              This is typically used for scanning grammars like  Erlang  where
182              there  is  an  explicit end token, '.'. If no end token is found
183              then the whole file will be scanned and returned.  If  an  error
184              occurs  then  all  tokens  upto and including the next end token
185              will be skipped.
186
187              It is not designed to be called directly by an  application  but
188              used  through the i/o system where it can typically be called in
189              an application by:
190
191              io:request(InFile, {get_until,Prompt,Module,tokens,[Line]})
192                -> TokensRet
193

INPUT FILE FORMAT

195       Erlang style comments starting with a % are allowed in scanner files. A
196       definition file has the following format:
197
198       <Header>
199
200       Definitions.
201
202       <Macro Definitions>
203
204       Rules.
205
206       <Token Rules>
207
208       Erlang code.
209
210       <Erlang code>
211
212       The  "Definitions.", "Rules." and "Erlang code." headings are mandatory
213       and must occur at the beginning of a source line. The <Header>,  <Macro
214       Definitions>  and <Erlang code> sections may be empty but there must be
215       at least one rule.
216
217       Macro definitions have the following format:
218
219       NAME = VALUE
220
221       and there must be spaces around =. Macros can be used  in  the  regular
222       expressions of rules by writing {NAME}.
223
224   Note:
225       When macros are expanded in expressions the macro calls are replaced by
226       the macro value without any form of quoting or enclosing  in  parenthe‐
227       ses.
228
229
230       Rules have the following format:
231
232       <Regexp> : <Erlang code>.
233
234       The  <Regexp>  must  occur  at  the start of a line and not include any
235       blanks; use \t and \s to include TAB and SPACE characters in the  regu‐
236       lar  expression.  If  <Regexp>  matches  then the corresponding <Erlang
237       code> is evaluated to generate a token. With the Erlang code  the  fol‐
238       lowing predefined variables are available:
239
240         TokenChars:
241           A list of the characters in the matched token.
242
243         TokenLen:
244           The number of characters in the matched token.
245
246         TokenLine:
247           The line number where the token occurred.
248
249       The code must return:
250
251         {token,Token}:
252           Return Token to the caller.
253
254         {end_token,Token}:
255           Return Token and is last token in a tokens call.
256
257         skip_token:
258           Skip this token completely.
259
260         {error,ErrString}:
261           An error in the token, ErrString is a string describing the error.
262
263       It  is  also possible to push back characters into the input characters
264       with the following returns:
265
266         * {token,Token,PushBackList}
267
268         * {end_token,Token,PushBackList}
269
270         * {skip_token,PushBackList}
271
272       These have the same meanings as the normal returns but  the  characters
273       in  PushBackList  will be prepended to the input characters and scanned
274       for the next token. Note that pushing back a newline will mean the line
275       numbering will no longer be correct.
276
277   Note:
278       Pushing back characters gives you unexpected possibilities to cause the
279       scanner to loop!
280
281
282       The following example would match a simple Erlang integer or float  and
283       return a token which could be sent to the Erlang parser:
284
285       D = [0-9]
286
287       {D}+ :
288         {token,{integer,TokenLine,list_to_integer(TokenChars)}}.
289
290       {D}+\.{D}+((E|e)(\+|\-)?{D}+)? :
291         {token,{float,TokenLine,list_to_float(TokenChars)}}.
292
293       The  Erlang code in the "Erlang code." section is written into the out‐
294       put file directly after the module declaration and  predefined  exports
295       declaration  so it is possible to add extra exports, define imports and
296       other attributes which are then visible in the whole file.
297

REGULAR EXPRESSIONS

299       The regular expressions allowed here is a subset of the  set  found  in
300       egrep  and in the AWK programming language, as defined in the book, The
301       AWK Programming Language, by A. V. Aho, B. W. Kernighan,  P.  J.  Wein‐
302       berger. They are composed of the following characters:
303
304         c:
305           Matches the non-metacharacter c.
306
307         \c:
308           Matches the escape sequence or literal character c.
309
310         .:
311           Matches any character.
312
313         ^:
314           Matches the beginning of a string.
315
316         $:
317           Matches the end of a string.
318
319         [abc...]:
320           Character  class, which matches any of the characters abc.... Char‐
321           acter ranges are specified by a pair of characters separated  by  a
322           -.
323
324         [^abc...]:
325           Negated character class, which matches any character except abc....
326
327         r1 | r2:
328           Alternation. It matches either r1 or r2.
329
330         r1r2:
331           Concatenation. It matches r1 and then r2.
332
333         r+:
334           Matches one or more rs.
335
336         r*:
337           Matches zero or more rs.
338
339         r?:
340           Matches zero or one rs.
341
342         (r):
343           Grouping. It matches r.
344
345       The escape sequences allowed are the same as for Erlang strings:
346
347         \b:
348           Backspace.
349
350         \f:
351           Form feed.
352
353         \n:
354           Newline (line feed).
355
356         \r:
357           Carriage return.
358
359         \t:
360           Tab.
361
362         \e:
363           Escape.
364
365         \v:
366           Vertical tab.
367
368         \s:
369           Space.
370
371         \d:
372           Delete.
373
374         \ddd:
375           The octal value ddd.
376
377         \xhh:
378           The hexadecimal value hh.
379
380         \x{h...}:
381           The hexadecimal value h....
382
383         \c:
384           Any other character literally, for example \\ for backslash, \" for
385           ".
386
387       The following examples define simplified versions of a few Erlang  data
388       types:
389
390       Atoms [a-z][0-9a-zA-Z_]*
391
392       Variables [A-Z_][0-9a-zA-Z_]*
393
394       Floats (\+|-)?[0-9]+\.[0-9]+((E|e)(\+|-)?[0-9]+)?
395
396   Note:
397       Anchoring  a  regular expression with ^ and $ is not implemented in the
398       current version of Leex and just generates a parse error.
399
400
401
402Ericsson AB                    parsetools 2.1.6                        leex(3)
Impressum