1erl_scan(3)                Erlang Module Definition                erl_scan(3)
2
3
4

NAME

6       erl_scan - The Erlang token scanner.
7

DESCRIPTION

9       This  module  contains  functions  for tokenizing (scanning) characters
10       into Erlang tokens.
11

DATA TYPES

13       category() = atom()
14
15       error_description() = term()
16
17       error_info() =
18           {erl_anno:location(), module(), error_description()}
19
20       option() =
21           return | return_white_spaces | return_comments | text |
22           {reserved_word_fun, resword_fun()} |
23           {text_fun, text_fun()}
24
25       options() = option() | [option()]
26
27       symbol() = atom() | float() | integer() | string()
28
29       resword_fun() = fun((atom()) -> boolean())
30
31       token() =
32           {category(), Anno :: erl_anno:anno(), symbol()} |
33           {category(), Anno :: erl_anno:anno()}
34
35       tokens() = [token()]
36
37       tokens_result() =
38           {ok, Tokens :: tokens(), EndLocation :: erl_anno:location()} |
39           {eof, EndLocation :: erl_anno:location()} |
40           {error,
41            ErrorInfo :: error_info(),
42            EndLocation :: erl_anno:location()}
43
44       text_fun() = fun((atom(), string()) -> boolean())
45

EXPORTS

47       category(Token) -> category()
48
49              Types:
50
51                 Token = token()
52
53              Returns the category of Token.
54
55       column(Token) -> erl_anno:column() | undefined
56
57              Types:
58
59                 Token = token()
60
61              Returns the column of Token's collection of annotations.
62
63       end_location(Token) -> erl_anno:location() | undefined
64
65              Types:
66
67                 Token = token()
68
69              Returns the end location of the text of  Token's  collection  of
70              annotations. If there is no text, undefined is returned.
71
72       format_error(ErrorDescriptor) -> string()
73
74              Types:
75
76                 ErrorDescriptor = error_description()
77
78              Uses  an ErrorDescriptor and returns a string that describes the
79              error or warning. This function  is  usually  called  implicitly
80              when  an ErrorInfo structure is processed (see section Error In‐
81              formation).
82
83       line(Token) -> erl_anno:line()
84
85              Types:
86
87                 Token = token()
88
89              Returns the line of Token's collection of annotations.
90
91       location(Token) -> erl_anno:location()
92
93              Types:
94
95                 Token = token()
96
97              Returns the location of Token's collection of annotations.
98
99       reserved_word(Atom :: atom()) -> boolean()
100
101              Returns true if Atom  is  an  Erlang  reserved  word,  otherwise
102              false.
103
104       string(String) -> Return
105
106       string(String, StartLocation) -> Return
107
108       string(String, StartLocation, Options) -> Return
109
110              Types:
111
112                 String = string()
113                 Options = options()
114                 Return =
115                     {ok, Tokens :: tokens(), EndLocation} |
116                     {error, ErrorInfo :: error_info(), ErrorLocation}
117                 StartLocation  = EndLocation = ErrorLocation = erl_anno:loca‐
118                 tion()
119
120              Takes the list of characters String and tries to scan (tokenize)
121              them. Returns one of the following:
122
123                {ok, Tokens, EndLocation}:
124                  Tokens are the Erlang tokens from String. EndLocation is the
125                  first location after the last token.
126
127                {error, ErrorInfo, ErrorLocation}:
128                  An error occurred. ErrorLocation is the first location after
129                  the erroneous token.
130
131              string(String)   is   equivalent   to   string(String,  1),  and
132              string(String, StartLocation) is  equivalent  to  string(String,
133              StartLocation, []).
134
135              StartLocation  indicates  the  initial  location  when  scanning
136              starts. If StartLocation is a line, Anno, EndLocation,  and  Er‐
137              rorLocation  are lines. If StartLocation is a pair of a line and
138              a column, Anno takes the form of an opaque compound  data  type,
139              and EndLocation and ErrorLocation are pairs of a line and a col‐
140              umn. The token annotations contain information about the  column
141              and  the line where the token begins, as well as the text of the
142              token (if option text is specified), all of  which  can  be  ac‐
143              cessed by calling column/1, line/1, location/1, and text/1.
144
145              A  token is a tuple containing information about syntactic cate‐
146              gory, the token annotations, and the terminal symbol. For  punc‐
147              tuation  characters  (such  as  ; and |) and reserved words, the
148              category and the symbol coincide, and the token  is  represented
149              by a two-tuple. Three-tuples have one of the following forms:
150
151                * {atom, Anno, atom()}
152
153                * {char, Anno, char()}
154
155                * {comment, Anno, string()}
156
157                * {float, Anno, float()}
158
159                * {integer, Anno, integer()}
160
161                * {var, Anno, atom()}
162
163                * {white_space, Anno, string()}
164
165              Valid options:
166
167                {reserved_word_fun, reserved_word_fun()}:
168                  A  callback  function  that  is  called when the scanner has
169                  found an unquoted atom. If the function  returns  true,  the
170                  unquoted  atom  itself becomes the category of the token. If
171                  the function returns false, atom becomes the category of the
172                  unquoted atom.
173
174                return_comments:
175                  Return comment tokens.
176
177                return_white_spaces:
178                  Return  white space tokens. By convention, a newline charac‐
179                  ter, if present, is always the first character of  the  text
180                  (there  cannot be more than one newline in a white space to‐
181                  ken).
182
183                return:
184                  Short for [return_comments, return_white_spaces].
185
186                text:
187                  Include the token text in the token annotation. The text  is
188                  the  part  of the input corresponding to the token. See also
189                  text_fun.
190
191                {text_fun, text_fun()}:
192                  A callback function used to determine whether the full  text
193                  for the token shall be included in the token annotation. Ar‐
194                  guments of the function are the category of  the  token  and
195                  the  full  token  string. This is only used when text is not
196                  present. If neither are present the text will not  be  saved
197                  in the token annotation.
198
199       symbol(Token) -> symbol()
200
201              Types:
202
203                 Token = token()
204
205              Returns the symbol of Token.
206
207       text(Token) -> erl_anno:text() | undefined
208
209              Types:
210
211                 Token = token()
212
213              Returns  the text of Token's collection of annotations. If there
214              is no text, undefined is returned.
215
216       tokens(Continuation, CharSpec, StartLocation) -> Return
217
218       tokens(Continuation, CharSpec, StartLocation, Options) -> Return
219
220              Types:
221
222                 Continuation = return_cont() | []
223                 CharSpec = char_spec()
224                 StartLocation = erl_anno:location()
225                 Options = options()
226                 Return =
227                     {done,
228                      Result :: tokens_result(),
229                      LeftOverChars :: char_spec()} |
230                     {more, Continuation1 :: return_cont()}
231                 char_spec() = string() | eof
232                 return_cont()
233                   An opaque continuation.
234
235              This is the re-entrant scanner, which scans characters until ei‐
236              ther a dot ('.' followed by a white space) or eof is reached. It
237              returns:
238
239                {done, Result, LeftOverChars}:
240                  Indicates that there is sufficient input data to get  a  re‐
241                  sult. Result is:
242
243                  {ok, Tokens, EndLocation}:
244                    The  scanning was successful. Tokens is the list of tokens
245                    including dot.
246
247                  {eof, EndLocation}:
248                    End of file was encountered before any more tokens.
249
250                  {error, ErrorInfo, EndLocation}:
251                    An error occurred. LeftOverChars is the remaining  charac‐
252                    ters of the input data, starting from EndLocation.
253
254                {more, Continuation1}:
255                  More  data  is  required  for building a term. Continuation1
256                  must be passed in a new call to tokens/3,4 when more data is
257                  available.
258
259              The  CharSpec  eof signals end of file. LeftOverChars then takes
260              the value eof as well.
261
262              tokens(Continuation, CharSpec, StartLocation) is  equivalent  to
263              tokens(Continuation, CharSpec, StartLocation, []).
264
265              For a description of the options, see string/3.
266

ERROR INFORMATION

268       ErrorInfo is the standard ErrorInfo structure that is returned from all
269       I/O modules. The format is as follows:
270
271       {ErrorLocation, Module, ErrorDescriptor}
272
273       A string describing the error is obtained with the following call:
274
275       Module:format_error(ErrorDescriptor)
276

NOTES

278       The continuation of the first call to the  re-entrant  input  functions
279       must  be  [].  For  a  complete description of how the re-entrant input
280       scheme works, see Armstrong, Virding and Williams: 'Concurrent Program‐
281       ming in Erlang', Chapter 13.
282

SEE ALSO

284       erl_anno(3), erl_parse(3), io(3)
285
286
287
288Ericsson AB                       stdlib 4.2                       erl_scan(3)
Impressum