1erl_scan(3) Erlang Module Definition erl_scan(3)
2
3
4
6 erl_scan - The Erlang token scanner.
7
9 This module contains functions for tokenizing (scanning) characters
10 into Erlang tokens.
11
13 category() = atom()
14
15 error_description() = term()
16
17 error_info() =
18 {erl_anno:location(), module(), error_description()}
19
20 option() =
21 return | return_white_spaces | return_comments | text |
22 {reserved_word_fun, resword_fun()} |
23 {text_fun, text_fun()}
24
25 options() = option() | [option()]
26
27 symbol() = atom() | float() | integer() | string()
28
29 resword_fun() = fun((atom()) -> boolean())
30
31 token() =
32 {category(), Anno :: erl_anno:anno(), symbol()} |
33 {category(), Anno :: erl_anno:anno()}
34
35 tokens() = [token()]
36
37 tokens_result() =
38 {ok, Tokens :: tokens(), EndLocation :: erl_anno:location()} |
39 {eof, EndLocation :: erl_anno:location()} |
40 {error,
41 ErrorInfo :: error_info(),
42 EndLocation :: erl_anno:location()}
43
44 text_fun() = fun((atom(), string()) -> boolean())
45
47 category(Token) -> category()
48
49 Types:
50
51 Token = token()
52
53 Returns the category of Token.
54
55 column(Token) -> erl_anno:column() | undefined
56
57 Types:
58
59 Token = token()
60
61 Returns the column of Token's collection of annotations.
62
63 end_location(Token) -> erl_anno:location() | undefined
64
65 Types:
66
67 Token = token()
68
69 Returns the end location of the text of Token's collection of
70 annotations. If there is no text, undefined is returned.
71
72 format_error(ErrorDescriptor) -> string()
73
74 Types:
75
76 ErrorDescriptor = error_description()
77
78 Uses an ErrorDescriptor and returns a string that describes the
79 error or warning. This function is usually called implicitly
80 when an ErrorInfo structure is processed (see section Error In‐
81 formation).
82
83 line(Token) -> erl_anno:line()
84
85 Types:
86
87 Token = token()
88
89 Returns the line of Token's collection of annotations.
90
91 location(Token) -> erl_anno:location()
92
93 Types:
94
95 Token = token()
96
97 Returns the location of Token's collection of annotations.
98
99 reserved_word(Atom :: atom()) -> boolean()
100
101 Returns true if Atom is an Erlang reserved word, otherwise
102 false.
103
104 string(String) -> Return
105
106 string(String, StartLocation) -> Return
107
108 string(String, StartLocation, Options) -> Return
109
110 Types:
111
112 String = string()
113 Options = options()
114 Return =
115 {ok, Tokens :: tokens(), EndLocation} |
116 {error, ErrorInfo :: error_info(), ErrorLocation}
117 StartLocation = EndLocation = ErrorLocation = erl_anno:loca‐
118 tion()
119
120 Takes the list of characters String and tries to scan (tokenize)
121 them. Returns one of the following:
122
123 {ok, Tokens, EndLocation}:
124 Tokens are the Erlang tokens from String. EndLocation is the
125 first location after the last token.
126
127 {error, ErrorInfo, ErrorLocation}:
128 An error occurred. ErrorLocation is the first location after
129 the erroneous token.
130
131 string(String) is equivalent to string(String, 1), and
132 string(String, StartLocation) is equivalent to string(String,
133 StartLocation, []).
134
135 StartLocation indicates the initial location when scanning
136 starts. If StartLocation is a line, Anno, EndLocation, and Er‐
137 rorLocation are lines. If StartLocation is a pair of a line and
138 a column, Anno takes the form of an opaque compound data type,
139 and EndLocation and ErrorLocation are pairs of a line and a col‐
140 umn. The token annotations contain information about the column
141 and the line where the token begins, as well as the text of the
142 token (if option text is specified), all of which can be ac‐
143 cessed by calling column/1, line/1, location/1, and text/1.
144
145 A token is a tuple containing information about syntactic cate‐
146 gory, the token annotations, and the terminal symbol. For punc‐
147 tuation characters (such as ; and |) and reserved words, the
148 category and the symbol coincide, and the token is represented
149 by a two-tuple. Three-tuples have one of the following forms:
150
151 * {atom, Anno, atom()}
152
153 * {char, Anno, char()}
154
155 * {comment, Anno, string()}
156
157 * {float, Anno, float()}
158
159 * {integer, Anno, integer()}
160
161 * {var, Anno, atom()}
162
163 * {white_space, Anno, string()}
164
165 Valid options:
166
167 {reserved_word_fun, reserved_word_fun()}:
168 A callback function that is called when the scanner has
169 found an unquoted atom. If the function returns true, the
170 unquoted atom itself becomes the category of the token. If
171 the function returns false, atom becomes the category of the
172 unquoted atom.
173
174 return_comments:
175 Return comment tokens.
176
177 return_white_spaces:
178 Return white space tokens. By convention, a newline charac‐
179 ter, if present, is always the first character of the text
180 (there cannot be more than one newline in a white space to‐
181 ken).
182
183 return:
184 Short for [return_comments, return_white_spaces].
185
186 text:
187 Include the token text in the token annotation. The text is
188 the part of the input corresponding to the token. See also
189 text_fun.
190
191 {text_fun, text_fun()}:
192 A callback function used to determine whether the full text
193 for the token shall be included in the token annotation. Ar‐
194 guments of the function are the category of the token and
195 the full token string. This is only used when text is not
196 present. If neither are present the text will not be saved
197 in the token annotation.
198
199 symbol(Token) -> symbol()
200
201 Types:
202
203 Token = token()
204
205 Returns the symbol of Token.
206
207 text(Token) -> erl_anno:text() | undefined
208
209 Types:
210
211 Token = token()
212
213 Returns the text of Token's collection of annotations. If there
214 is no text, undefined is returned.
215
216 tokens(Continuation, CharSpec, StartLocation) -> Return
217
218 tokens(Continuation, CharSpec, StartLocation, Options) -> Return
219
220 Types:
221
222 Continuation = return_cont() | []
223 CharSpec = char_spec()
224 StartLocation = erl_anno:location()
225 Options = options()
226 Return =
227 {done,
228 Result :: tokens_result(),
229 LeftOverChars :: char_spec()} |
230 {more, Continuation1 :: return_cont()}
231 char_spec() = string() | eof
232 return_cont()
233 An opaque continuation.
234
235 This is the re-entrant scanner, which scans characters until ei‐
236 ther a dot ('.' followed by a white space) or eof is reached. It
237 returns:
238
239 {done, Result, LeftOverChars}:
240 Indicates that there is sufficient input data to get a re‐
241 sult. Result is:
242
243 {ok, Tokens, EndLocation}:
244 The scanning was successful. Tokens is the list of tokens
245 including dot.
246
247 {eof, EndLocation}:
248 End of file was encountered before any more tokens.
249
250 {error, ErrorInfo, EndLocation}:
251 An error occurred. LeftOverChars is the remaining charac‐
252 ters of the input data, starting from EndLocation.
253
254 {more, Continuation1}:
255 More data is required for building a term. Continuation1
256 must be passed in a new call to tokens/3,4 when more data is
257 available.
258
259 The CharSpec eof signals end of file. LeftOverChars then takes
260 the value eof as well.
261
262 tokens(Continuation, CharSpec, StartLocation) is equivalent to
263 tokens(Continuation, CharSpec, StartLocation, []).
264
265 For a description of the options, see string/3.
266
268 ErrorInfo is the standard ErrorInfo structure that is returned from all
269 I/O modules. The format is as follows:
270
271 {ErrorLocation, Module, ErrorDescriptor}
272
273 A string describing the error is obtained with the following call:
274
275 Module:format_error(ErrorDescriptor)
276
278 The continuation of the first call to the re-entrant input functions
279 must be []. For a complete description of how the re-entrant input
280 scheme works, see Armstrong, Virding and Williams: 'Concurrent Program‐
281 ming in Erlang', Chapter 13.
282
284 erl_anno(3), erl_parse(3), io(3)
285
286
287
288Ericsson AB stdlib 4.2 erl_scan(3)