1leex(3)                    Erlang Module Definition                    leex(3)
2
3
4

NAME

6       leex - Lexical analyzer generator for Erlang
7

DESCRIPTION

9       A regular expression based lexical analyzer generator for Erlang, simi‐
10       lar to lex or flex.
11
12   Note:
13       The Leex module should be considered experimental as it will be subject
14       to changes in future releases.
15
16

DATA TYPES

18       ErrorInfo = {ErrorLine,module(),error_descriptor()}
19       ErrorLine = integer()
20       Token = tuple()
21

EXPORTS

23       file(FileName) -> LeexRet
24       file(FileName, Options) -> LeexRet
25
26              Types:
27
28                 FileName = filename()
29                 Options = Option | [Option]
30                 Option = - see below -
31                 LeexRet  =  {ok, Scannerfile} | {ok, Scannerfile, Warnings} |
32                 error | {error, Errors, Warnings}
33                 Scannerfile = filename()
34                 Warnings = Errors = [{filename(), [ErrorInfo]}]
35                 ErrorInfo = {ErrorLine, module(), Reason}
36                 ErrorLine = integer()
37                 Reason = - formatable by format_error/1 -
38
39              Generates a lexical analyzer from the definition  in  the  input
40              file.  The  input  file has the extension .xrl. This is added to
41              the filename if it is not given. The resulting module is the Xrl
42              filename without the .xrl extension.
43
44              The current options are:
45
46                dfa_graph:
47                  Generates  a  .dot  file which contains a description of the
48                  DFA  in  a  format  which  can  be  viewed  with   Graphviz,
49                  www.graphviz.com.
50
51                {includefile,Includefile}:
52                  Uses  a  specific  or  customised  prologue  file instead of
53                  default lib/parsetools/include/leexinc.hrl which  is  other‐
54                  wise included.
55
56                {report_errors, bool()}:
57                  Causes errors to be printed as they occur. Default is true.
58
59                {report_warnings, bool()}:
60                  Causes  warnings  to  be  printed  as they occur. Default is
61                  true.
62
63                warnings_as_errors:
64                  Causes warnings to be treated as errors.
65
66                {report, bool()}:
67                  This is a short form for both report_errors and report_warn‐
68                  ings.
69
70                {return_errors, bool()}:
71                  If  this  flag is set, {error, Errors, Warnings} is returned
72                  when there are errors. Default is false.
73
74                {return_warnings, bool()}:
75                  If this flag is set, an extra field containing  Warnings  is
76                  added to the tuple returned upon success. Default is false.
77
78                {return, bool()}:
79                  This is a short form for both return_errors and return_warn‐
80                  ings.
81
82                {scannerfile, Scannerfile}:
83                  Scannerfile is the name of the file that  will  contain  the
84                  Erlang  scanner  code that is generated. The default ("") is
85                  to add the extension .erl to FileName stripped of  the  .xrl
86                  extension.
87
88                {verbose, bool()}:
89                  Outputs information from parsing the input file and generat‐
90                  ing the internal tables.
91
92              Any of the Boolean options can be set to  true  by  stating  the
93              name  of the option. For example, verbose is equivalent to {ver‐
94              bose, true}.
95
96              Leex will add the extension .hrl to the Includefile name and the
97              extension  .erl to the Scannerfile name, unless the extension is
98              already there.
99
100       format_error(ErrorInfo) -> Chars
101
102              Types:
103
104                 Chars = [char() | Chars]
105
106              Returns a string which describes the  error  ErrorInfo  returned
107              when there is an error in a regular expression.
108

GENERATED SCANNER EXPORTS

110       The following functions are exported by the generated scanner.
111

EXPORTS

113       Module:string(String) -> StringRet
114       Module:string(String, StartLine) -> StringRet
115
116              Types:
117
118                 String = string()
119                 StringRet = {ok,Tokens,EndLine} | ErrorInfo
120                 Tokens = [Token]
121                 EndLine = StartLine = integer()
122
123              Scans String and returns all the tokens in it, or an error.
124
125          Note:
126              It  is  an error if not all of the characters in String are con‐
127              sumed.
128
129
130       Module:token(Cont, Chars) -> {more,Cont1} | {done,TokenRet,RestChars}
131       Module:token(Cont, Chars, StartLine)  ->  {more,Cont1}  |  {done,Token‐
132       Ret,RestChars}
133
134              Types:
135
136                 Cont = [] | Cont1
137                 Cont1 = tuple()
138                 Chars = RestChars = string() | eof
139                 TokenRet = {ok, Token, EndLine} | {eof, EndLine} | ErrorInfo
140                 StartLine = EndLine = integer()
141
142              This  is a re-entrant call to try and scan one token from Chars.
143              If there are enough characters in Chars to either scan  a  token
144              or  detect  an error then this will be returned with {done,...}.
145              Otherwise {cont,Cont} will be returned where Cont is used in the
146              next  call  to  token()  with more characters to try an scan the
147              token. This is continued until a token has been scanned. Cont is
148              initially [].
149
150              It  is  not designed to be called directly by an application but
151              used through the i/o system where it can typically be called  in
152              an application by:
153
154              io:request(InFile, {get_until,unicode,Prompt,Module,token,[Line]})
155                -> TokenRet
156
157       Module:tokens(Cont, Chars) -> {more,Cont1} | {done,TokensRet,RestChars}
158       Module:tokens(Cont,  Chars,  StartLine)  -> {more,Cont1} | {done,Token‐
159       sRet,RestChars}
160
161              Types:
162
163                 Cont = [] | Cont1
164                 Cont1 = tuple()
165                 Chars = RestChars = string() | eof
166                 TokensRet = {ok, Tokens, EndLine} | {eof, EndLine}  |  Error‐
167                 Info
168                 Tokens = [Token]
169                 StartLine = EndLine = integer()
170
171              This  is a re-entrant call to try and scan tokens from Chars. If
172              there are enough characters in Chars to either  scan  tokens  or
173              detect an error then this will be returned with {done,...}. Oth‐
174              erwise {cont,Cont} will be returned where Cont is  used  in  the
175              next  call  to  tokens() with more characters to try an scan the
176              tokens. This is continued until all tokens  have  been  scanned.
177              Cont is initially [].
178
179              This  functions  differs  from token in that it will continue to
180              scan tokens upto and including  an  {end_token,Token}  has  been
181              scanned  (see next section). It will then return all the tokens.
182              This is typically used for scanning grammars like  Erlang  where
183              there  is  an  explicit end token, '.'. If no end token is found
184              then the whole file will be scanned and returned.  If  an  error
185              occurs  then  all  tokens  upto and including the next end token
186              will be skipped.
187
188              It is not designed to be called directly by an  application  but
189              used  through the i/o system where it can typically be called in
190              an application by:
191
192              io:request(InFile, {get_until,unicode,Prompt,Module,tokens,[Line]})
193                -> TokensRet
194

INPUT FILE FORMAT

196       Erlang style comments starting with a % are allowed in scanner files. A
197       definition file has the following format:
198
199       <Header>
200
201       Definitions.
202
203       <Macro Definitions>
204
205       Rules.
206
207       <Token Rules>
208
209       Erlang code.
210
211       <Erlang code>
212
213       The  "Definitions.", "Rules." and "Erlang code." headings are mandatory
214       and must occur at the beginning of a source line. The <Header>,  <Macro
215       Definitions>  and <Erlang code> sections may be empty but there must be
216       at least one rule.
217
218       Macro definitions have the following format:
219
220       NAME = VALUE
221
222       and there must be spaces around =. Macros can be used  in  the  regular
223       expressions of rules by writing {NAME}.
224
225   Note:
226       When macros are expanded in expressions the macro calls are replaced by
227       the macro value without any form of quoting or enclosing  in  parenthe‐
228       ses.
229
230
231       Rules have the following format:
232
233       <Regexp> : <Erlang code>.
234
235       The  <Regexp>  must  occur  at  the start of a line and not include any
236       blanks; use \t and \s to include TAB and SPACE characters in the  regu‐
237       lar  expression.  If  <Regexp>  matches  then the corresponding <Erlang
238       code> is evaluated to generate a token. With the Erlang code  the  fol‐
239       lowing predefined variables are available:
240
241         TokenChars:
242           A list of the characters in the matched token.
243
244         TokenLen:
245           The number of characters in the matched token.
246
247         TokenLine:
248           The line number where the token occurred.
249
250       The code must return:
251
252         {token,Token}:
253           Return Token to the caller.
254
255         {end_token,Token}:
256           Return Token and is last token in a tokens call.
257
258         skip_token:
259           Skip this token completely.
260
261         {error,ErrString}:
262           An error in the token, ErrString is a string describing the error.
263
264       It  is  also possible to push back characters into the input characters
265       with the following returns:
266
267         * {token,Token,PushBackList}
268
269         * {end_token,Token,PushBackList}
270
271         * {skip_token,PushBackList}
272
273       These have the same meanings as the normal returns but  the  characters
274       in  PushBackList  will be prepended to the input characters and scanned
275       for the next token. Note that pushing back a newline will mean the line
276       numbering will no longer be correct.
277
278   Note:
279       Pushing back characters gives you unexpected possibilities to cause the
280       scanner to loop!
281
282
283       The following example would match a simple Erlang integer or float  and
284       return a token which could be sent to the Erlang parser:
285
286       D = [0-9]
287
288       {D}+ :
289         {token,{integer,TokenLine,list_to_integer(TokenChars)}}.
290
291       {D}+\.{D}+((E|e)(\+|\-)?{D}+)? :
292         {token,{float,TokenLine,list_to_float(TokenChars)}}.
293
294       The  Erlang code in the "Erlang code." section is written into the out‐
295       put file directly after the module declaration and  predefined  exports
296       declaration  so it is possible to add extra exports, define imports and
297       other attributes which are then visible in the whole file.
298

REGULAR EXPRESSIONS

300       The regular expressions allowed here is a subset of the  set  found  in
301       egrep  and in the AWK programming language, as defined in the book, The
302       AWK Programming Language, by A. V. Aho, B. W. Kernighan,  P.  J.  Wein‐
303       berger. They are composed of the following characters:
304
305         c:
306           Matches the non-metacharacter c.
307
308         \c:
309           Matches the escape sequence or literal character c.
310
311         .:
312           Matches any character.
313
314         ^:
315           Matches the beginning of a string.
316
317         $:
318           Matches the end of a string.
319
320         [abc...]:
321           Character  class, which matches any of the characters abc.... Char‐
322           acter ranges are specified by a pair of characters separated  by  a
323           -.
324
325         [^abc...]:
326           Negated character class, which matches any character except abc....
327
328         r1 | r2:
329           Alternation. It matches either r1 or r2.
330
331         r1r2:
332           Concatenation. It matches r1 and then r2.
333
334         r+:
335           Matches one or more rs.
336
337         r*:
338           Matches zero or more rs.
339
340         r?:
341           Matches zero or one rs.
342
343         (r):
344           Grouping. It matches r.
345
346       The escape sequences allowed are the same as for Erlang strings:
347
348         \b:
349           Backspace.
350
351         \f:
352           Form feed.
353
354         \n:
355           Newline (line feed).
356
357         \r:
358           Carriage return.
359
360         \t:
361           Tab.
362
363         \e:
364           Escape.
365
366         \v:
367           Vertical tab.
368
369         \s:
370           Space.
371
372         \d:
373           Delete.
374
375         \ddd:
376           The octal value ddd.
377
378         \xhh:
379           The hexadecimal value hh.
380
381         \x{h...}:
382           The hexadecimal value h....
383
384         \c:
385           Any other character literally, for example \\ for backslash, \" for
386           ".
387
388       The following examples define simplified versions of a few Erlang  data
389       types:
390
391       Atoms [a-z][0-9a-zA-Z_]*
392
393       Variables [A-Z_][0-9a-zA-Z_]*
394
395       Floats (\+|-)?[0-9]+\.[0-9]+((E|e)(\+|-)?[0-9]+)?
396
397   Note:
398       Anchoring  a  regular expression with ^ and $ is not implemented in the
399       current version of Leex and just generates a parse error.
400
401
402
403Ericsson AB                     parsetools 2.2                         leex(3)
Impressum