1erl_scan(3) Erlang Module Definition erl_scan(3)
2
3
4
6 erl_scan - The Erlang token scanner.
7
9 This module contains functions for tokenizing (scanning) characters
10 into Erlang tokens.
11
13 category() = atom()
14
15 error_description() = term()
16
17 error_info() =
18 {erl_anno:location(), module(), error_description()}
19
20 option() =
21 return | return_white_spaces | return_comments | text |
22 {reserved_word_fun, resword_fun()} |
23 {text_fun, text_fun()} |
24 {compiler_internal, [term()]}
25
26 options() = option() | [option()]
27
28 symbol() = atom() | float() | integer() | string()
29
30 resword_fun() = fun((atom()) -> boolean())
31
32 token() =
33 {category(), Anno :: erl_anno:anno(), symbol()} |
34 {category(), Anno :: erl_anno:anno()}
35
36 tokens() = [token()]
37
38 tokens_result() =
39 {ok, Tokens :: tokens(), EndLocation :: erl_anno:location()} |
40 {eof, EndLocation :: erl_anno:location()} |
41 {error,
42 ErrorInfo :: error_info(),
43 EndLocation :: erl_anno:location()}
44
45 text_fun() = fun((atom(), string()) -> boolean())
46
48 category(Token) -> category()
49
50 Types:
51
52 Token = token()
53
54 Returns the category of Token.
55
56 column(Token) -> erl_anno:column() | undefined
57
58 Types:
59
60 Token = token()
61
62 Returns the column of Token's collection of annotations.
63
64 end_location(Token) -> erl_anno:location() | undefined
65
66 Types:
67
68 Token = token()
69
70 Returns the end location of the text of Token's collection of
71 annotations. If there is no text, undefined is returned.
72
73 format_error(ErrorDescriptor) -> string()
74
75 Types:
76
77 ErrorDescriptor = error_description()
78
79 Uses an ErrorDescriptor and returns a string that describes the
80 error or warning. This function is usually called implicitly
81 when an ErrorInfo structure is processed (see section Error In‐
82 formation).
83
84 line(Token) -> erl_anno:line()
85
86 Types:
87
88 Token = token()
89
90 Returns the line of Token's collection of annotations.
91
92 location(Token) -> erl_anno:location()
93
94 Types:
95
96 Token = token()
97
98 Returns the location of Token's collection of annotations.
99
100 reserved_word(Atom :: atom()) -> boolean()
101
102 Returns true if Atom is an Erlang reserved word, otherwise
103 false.
104
105 string(String) -> Return
106
107 string(String, StartLocation) -> Return
108
109 string(String, StartLocation, Options) -> Return
110
111 Types:
112
113 String = string()
114 Options = options()
115 Return =
116 {ok, Tokens :: tokens(), EndLocation} |
117 {error, ErrorInfo :: error_info(), ErrorLocation}
118 StartLocation = EndLocation = ErrorLocation = erl_anno:loca‐
119 tion()
120
121 Takes the list of characters String and tries to scan (tokenize)
122 them. Returns one of the following:
123
124 {ok, Tokens, EndLocation}:
125 Tokens are the Erlang tokens from String. EndLocation is the
126 first location after the last token.
127
128 {error, ErrorInfo, ErrorLocation}:
129 An error occurred. ErrorLocation is the first location after
130 the erroneous token.
131
132 string(String) is equivalent to string(String, 1), and
133 string(String, StartLocation) is equivalent to string(String,
134 StartLocation, []).
135
136 StartLocation indicates the initial location when scanning
137 starts. If StartLocation is a line, Anno, EndLocation, and Er‐
138 rorLocation are lines. If StartLocation is a pair of a line and
139 a column, Anno takes the form of an opaque compound data type,
140 and EndLocation and ErrorLocation are pairs of a line and a col‐
141 umn. The token annotations contain information about the column
142 and the line where the token begins, as well as the text of the
143 token (if option text is specified), all of which can be ac‐
144 cessed by calling column/1, line/1, location/1, and text/1.
145
146 A token is a tuple containing information about syntactic cate‐
147 gory, the token annotations, and the terminal symbol. For punc‐
148 tuation characters (such as ; and |) and reserved words, the
149 category and the symbol coincide, and the token is represented
150 by a two-tuple. Three-tuples have one of the following forms:
151
152 * {atom, Anno, atom()}
153
154 * {char, Anno, char()}
155
156 * {comment, Anno, string()}
157
158 * {float, Anno, float()}
159
160 * {integer, Anno, integer()}
161
162 * {var, Anno, atom()}
163
164 * {white_space, Anno, string()}
165
166 Valid options:
167
168 {reserved_word_fun, reserved_word_fun()}:
169 A callback function that is called when the scanner has
170 found an unquoted atom. If the function returns true, the
171 unquoted atom itself becomes the category of the token. If
172 the function returns false, atom becomes the category of the
173 unquoted atom.
174
175 return_comments:
176 Return comment tokens.
177
178 return_white_spaces:
179 Return white space tokens. By convention, a newline charac‐
180 ter, if present, is always the first character of the text
181 (there cannot be more than one newline in a white space to‐
182 ken).
183
184 return:
185 Short for [return_comments, return_white_spaces].
186
187 text:
188 Include the token text in the token annotation. The text is
189 the part of the input corresponding to the token. See also
190 text_fun.
191
192 {text_fun, text_fun()}:
193 A callback function used to determine whether the full text
194 for the token shall be included in the token annotation. Ar‐
195 guments of the function are the category of the token and
196 the full token string. This is only used when text is not
197 present. If neither are present the text will not be saved
198 in the token annotation.
199
200 {compiler_internal, term()}:
201 Pass compiler-internal options to the scanner. The set of
202 internal options understood by the scanner should be consid‐
203 ered experimental and can thus be changed at any time with‐
204 out prior warning.
205
206 The following options are currently understood:
207
208 ssa_checks:
209 Tokenizes source code annotations used for encoding tests
210 on the BEAM SSA code produced by the compiler.
211
212 symbol(Token) -> symbol()
213
214 Types:
215
216 Token = token()
217
218 Returns the symbol of Token.
219
220 text(Token) -> erl_anno:text() | undefined
221
222 Types:
223
224 Token = token()
225
226 Returns the text of Token's collection of annotations. If there
227 is no text, undefined is returned.
228
229 tokens(Continuation, CharSpec, StartLocation) -> Return
230
231 tokens(Continuation, CharSpec, StartLocation, Options) -> Return
232
233 Types:
234
235 Continuation = return_cont() | []
236 CharSpec = char_spec()
237 StartLocation = erl_anno:location()
238 Options = options()
239 Return =
240 {done,
241 Result :: tokens_result(),
242 LeftOverChars :: char_spec()} |
243 {more, Continuation1 :: return_cont()}
244 char_spec() = string() | eof
245 return_cont()
246 An opaque continuation.
247
248 This is the re-entrant scanner, which scans characters until ei‐
249 ther a dot ('.' followed by a white space) or eof is reached. It
250 returns:
251
252 {done, Result, LeftOverChars}:
253 Indicates that there is sufficient input data to get a re‐
254 sult. Result is:
255
256 {ok, Tokens, EndLocation}:
257 The scanning was successful. Tokens is the list of tokens
258 including dot.
259
260 {eof, EndLocation}:
261 End of file was encountered before any more tokens.
262
263 {error, ErrorInfo, EndLocation}:
264 An error occurred. LeftOverChars is the remaining charac‐
265 ters of the input data, starting from EndLocation.
266
267 {more, Continuation1}:
268 More data is required for building a term. Continuation1
269 must be passed in a new call to tokens/3,4 when more data is
270 available.
271
272 The CharSpec eof signals end of file. LeftOverChars then takes
273 the value eof as well.
274
275 tokens(Continuation, CharSpec, StartLocation) is equivalent to
276 tokens(Continuation, CharSpec, StartLocation, []).
277
278 For a description of the options, see string/3.
279
281 ErrorInfo is the standard ErrorInfo structure that is returned from all
282 I/O modules. The format is as follows:
283
284 {ErrorLocation, Module, ErrorDescriptor}
285
286 A string describing the error is obtained with the following call:
287
288 Module:format_error(ErrorDescriptor)
289
291 The continuation of the first call to the re-entrant input functions
292 must be []. For a complete description of how the re-entrant input
293 scheme works, see Armstrong, Virding and Williams: 'Concurrent Program‐
294 ming in Erlang', Chapter 13.
295
297 erl_anno(3), erl_parse(3), io(3)
298
299
300
301Ericsson AB stdlib 5.1.1 erl_scan(3)