1erl_scan(3)                Erlang Module Definition                erl_scan(3)
2
3
4

NAME

6       erl_scan - The Erlang token scanner.
7

DESCRIPTION

9       This  module  contains  functions  for tokenizing (scanning) characters
10       into Erlang tokens.
11

DATA TYPES

13       category() = atom()
14
15       error_description() = term()
16
17       error_info() =
18           {erl_anno:location(), module(), error_description()}
19
20       option() =
21           return | return_white_spaces | return_comments | text |
22           {reserved_word_fun, resword_fun()} |
23           {text_fun, text_fun()} |
24           {compiler_internal, [term()]}
25
26       options() = option() | [option()]
27
28       symbol() = atom() | float() | integer() | string()
29
30       resword_fun() = fun((atom()) -> boolean())
31
32       token() =
33           {category(), Anno :: erl_anno:anno(), symbol()} |
34           {category(), Anno :: erl_anno:anno()}
35
36       tokens() = [token()]
37
38       tokens_result() =
39           {ok, Tokens :: tokens(), EndLocation :: erl_anno:location()} |
40           {eof, EndLocation :: erl_anno:location()} |
41           {error,
42            ErrorInfo :: error_info(),
43            EndLocation :: erl_anno:location()}
44
45       text_fun() = fun((atom(), string()) -> boolean())
46

EXPORTS

48       category(Token) -> category()
49
50              Types:
51
52                 Token = token()
53
54              Returns the category of Token.
55
56       column(Token) -> erl_anno:column() | undefined
57
58              Types:
59
60                 Token = token()
61
62              Returns the column of Token's collection of annotations.
63
64       end_location(Token) -> erl_anno:location() | undefined
65
66              Types:
67
68                 Token = token()
69
70              Returns the end location of the text of  Token's  collection  of
71              annotations. If there is no text, undefined is returned.
72
73       format_error(ErrorDescriptor) -> string()
74
75              Types:
76
77                 ErrorDescriptor = error_description()
78
79              Uses  an ErrorDescriptor and returns a string that describes the
80              error or warning. This function  is  usually  called  implicitly
81              when  an ErrorInfo structure is processed (see section Error In‐
82              formation).
83
84       line(Token) -> erl_anno:line()
85
86              Types:
87
88                 Token = token()
89
90              Returns the line of Token's collection of annotations.
91
92       location(Token) -> erl_anno:location()
93
94              Types:
95
96                 Token = token()
97
98              Returns the location of Token's collection of annotations.
99
100       reserved_word(Atom :: atom()) -> boolean()
101
102              Returns true if Atom  is  an  Erlang  reserved  word,  otherwise
103              false.
104
105       string(String) -> Return
106
107       string(String, StartLocation) -> Return
108
109       string(String, StartLocation, Options) -> Return
110
111              Types:
112
113                 String = string()
114                 Options = options()
115                 Return =
116                     {ok, Tokens :: tokens(), EndLocation} |
117                     {error, ErrorInfo :: error_info(), ErrorLocation}
118                 StartLocation  = EndLocation = ErrorLocation = erl_anno:loca‐
119                 tion()
120
121              Takes the list of characters String and tries to scan (tokenize)
122              them. Returns one of the following:
123
124                {ok, Tokens, EndLocation}:
125                  Tokens are the Erlang tokens from String. EndLocation is the
126                  first location after the last token.
127
128                {error, ErrorInfo, ErrorLocation}:
129                  An error occurred. ErrorLocation is the first location after
130                  the erroneous token.
131
132              string(String)   is   equivalent   to   string(String,  1),  and
133              string(String, StartLocation) is  equivalent  to  string(String,
134              StartLocation, []).
135
136              StartLocation  indicates  the  initial  location  when  scanning
137              starts. If StartLocation is a line, Anno, EndLocation,  and  Er‐
138              rorLocation  are lines. If StartLocation is a pair of a line and
139              a column, Anno takes the form of an opaque compound  data  type,
140              and EndLocation and ErrorLocation are pairs of a line and a col‐
141              umn. The token annotations contain information about the  column
142              and  the line where the token begins, as well as the text of the
143              token (if option text is specified), all of  which  can  be  ac‐
144              cessed by calling column/1, line/1, location/1, and text/1.
145
146              A  token is a tuple containing information about syntactic cate‐
147              gory, the token annotations, and the terminal symbol. For  punc‐
148              tuation  characters  (such  as  ; and |) and reserved words, the
149              category and the symbol coincide, and the token  is  represented
150              by a two-tuple. Three-tuples have one of the following forms:
151
152                * {atom, Anno, atom()}
153
154                * {char, Anno, char()}
155
156                * {comment, Anno, string()}
157
158                * {float, Anno, float()}
159
160                * {integer, Anno, integer()}
161
162                * {var, Anno, atom()}
163
164                * {white_space, Anno, string()}
165
166              Valid options:
167
168                {reserved_word_fun, reserved_word_fun()}:
169                  A  callback  function  that  is  called when the scanner has
170                  found an unquoted atom. If the function  returns  true,  the
171                  unquoted  atom  itself becomes the category of the token. If
172                  the function returns false, atom becomes the category of the
173                  unquoted atom.
174
175                return_comments:
176                  Return comment tokens.
177
178                return_white_spaces:
179                  Return  white space tokens. By convention, a newline charac‐
180                  ter, if present, is always the first character of  the  text
181                  (there  cannot be more than one newline in a white space to‐
182                  ken).
183
184                return:
185                  Short for [return_comments, return_white_spaces].
186
187                text:
188                  Include the token text in the token annotation. The text  is
189                  the  part  of the input corresponding to the token. See also
190                  text_fun.
191
192                {text_fun, text_fun()}:
193                  A callback function used to determine whether the full  text
194                  for the token shall be included in the token annotation. Ar‐
195                  guments of the function are the category of  the  token  and
196                  the  full  token  string. This is only used when text is not
197                  present. If neither are present the text will not  be  saved
198                  in the token annotation.
199
200                {compiler_internal, term()}:
201                  Pass  compiler-internal  options  to the scanner. The set of
202                  internal options understood by the scanner should be consid‐
203                  ered  experimental and can thus be changed at any time with‐
204                  out prior warning.
205
206                  The following options are currently understood:
207
208                  ssa_checks:
209                    Tokenizes source code annotations used for encoding  tests
210                    on the BEAM SSA code produced by the compiler.
211
212       symbol(Token) -> symbol()
213
214              Types:
215
216                 Token = token()
217
218              Returns the symbol of Token.
219
220       text(Token) -> erl_anno:text() | undefined
221
222              Types:
223
224                 Token = token()
225
226              Returns  the text of Token's collection of annotations. If there
227              is no text, undefined is returned.
228
229       tokens(Continuation, CharSpec, StartLocation) -> Return
230
231       tokens(Continuation, CharSpec, StartLocation, Options) -> Return
232
233              Types:
234
235                 Continuation = return_cont() | []
236                 CharSpec = char_spec()
237                 StartLocation = erl_anno:location()
238                 Options = options()
239                 Return =
240                     {done,
241                      Result :: tokens_result(),
242                      LeftOverChars :: char_spec()} |
243                     {more, Continuation1 :: return_cont()}
244                 char_spec() = string() | eof
245                 return_cont()
246                   An opaque continuation.
247
248              This is the re-entrant scanner, which scans characters until ei‐
249              ther a dot ('.' followed by a white space) or eof is reached. It
250              returns:
251
252                {done, Result, LeftOverChars}:
253                  Indicates that there is sufficient input data to get  a  re‐
254                  sult. Result is:
255
256                  {ok, Tokens, EndLocation}:
257                    The  scanning was successful. Tokens is the list of tokens
258                    including dot.
259
260                  {eof, EndLocation}:
261                    End of file was encountered before any more tokens.
262
263                  {error, ErrorInfo, EndLocation}:
264                    An error occurred. LeftOverChars is the remaining  charac‐
265                    ters of the input data, starting from EndLocation.
266
267                {more, Continuation1}:
268                  More  data  is  required  for building a term. Continuation1
269                  must be passed in a new call to tokens/3,4 when more data is
270                  available.
271
272              The  CharSpec  eof signals end of file. LeftOverChars then takes
273              the value eof as well.
274
275              tokens(Continuation, CharSpec, StartLocation) is  equivalent  to
276              tokens(Continuation, CharSpec, StartLocation, []).
277
278              For a description of the options, see string/3.
279

ERROR INFORMATION

281       ErrorInfo is the standard ErrorInfo structure that is returned from all
282       I/O modules. The format is as follows:
283
284       {ErrorLocation, Module, ErrorDescriptor}
285
286       A string describing the error is obtained with the following call:
287
288       Module:format_error(ErrorDescriptor)
289

NOTES

291       The continuation of the first call to the  re-entrant  input  functions
292       must  be  [].  For  a  complete description of how the re-entrant input
293       scheme works, see Armstrong, Virding and Williams: 'Concurrent Program‐
294       ming in Erlang', Chapter 13.
295

SEE ALSO

297       erl_anno(3), erl_parse(3), io(3)
298
299
300
301Ericsson AB                      stdlib 5.1.1                      erl_scan(3)
Impressum