1erl_scan(3)                Erlang Module Definition                erl_scan(3)
2
3
4

NAME

6       erl_scan - The Erlang token scanner.
7

DESCRIPTION

9       This  module  contains  functions  for tokenizing (scanning) characters
10       into Erlang tokens.
11

DATA TYPES

13       category() = atom()
14
15       error_description() = term()
16
17       error_info() =
18           {erl_anno:location(), module(), error_description()}
19
20       option() =
21           return | return_white_spaces | return_comments | text |
22           {reserved_word_fun, resword_fun()}
23
24       options() = option() | [option()]
25
26       symbol() = atom() | float() | integer() | string()
27
28       resword_fun() = fun((atom()) -> boolean())
29
30       token() =
31           {category(), Anno :: erl_anno:anno(), symbol()} |
32           {category(), Anno :: erl_anno:anno()}
33
34       tokens() = [token()]
35
36       tokens_result() =
37           {ok, Tokens :: tokens(), EndLocation :: erl_anno:location()} |
38           {eof, EndLocation :: erl_anno:location()} |
39           {error,
40            ErrorInfo :: error_info(),
41            EndLocation :: erl_anno:location()}
42

EXPORTS

44       category(Token) -> category()
45
46              Types:
47
48                 Token = token()
49
50              Returns the category of Token.
51
52       column(Token) -> erl_anno:column() | undefined
53
54              Types:
55
56                 Token = token()
57
58              Returns the column of Token's collection of annotations.
59
60       end_location(Token) -> erl_anno:location() | undefined
61
62              Types:
63
64                 Token = token()
65
66              Returns the end location of the text of  Token's  collection  of
67              annotations. If there is no text, undefined is returned.
68
69       format_error(ErrorDescriptor) -> string()
70
71              Types:
72
73                 ErrorDescriptor = error_description()
74
75              Uses  an ErrorDescriptor and returns a string that describes the
76              error or warning. This function  is  usually  called  implicitly
77              when  an  ErrorInfo  structure  is  processed (see section Error
78              Information).
79
80       line(Token) -> erl_anno:line()
81
82              Types:
83
84                 Token = token()
85
86              Returns the line of Token's collection of annotations.
87
88       location(Token) -> erl_anno:location()
89
90              Types:
91
92                 Token = token()
93
94              Returns the location of Token's collection of annotations.
95
96       reserved_word(Atom :: atom()) -> boolean()
97
98              Returns true if Atom  is  an  Erlang  reserved  word,  otherwise
99              false.
100
101       string(String) -> Return
102
103       string(String, StartLocation) -> Return
104
105       string(String, StartLocation, Options) -> Return
106
107              Types:
108
109                 String = string()
110                 Options = options()
111                 Return =
112                     {ok, Tokens :: tokens(), EndLocation} |
113                     {error, ErrorInfo :: error_info(), ErrorLocation}
114                 StartLocation  = EndLocation = ErrorLocation = erl_anno:loca‐
115                 tion()
116
117              Takes the list of characters String and tries to scan (tokenize)
118              them. Returns one of the following:
119
120                {ok, Tokens, EndLocation}:
121                  Tokens are the Erlang tokens from String. EndLocation is the
122                  first location after the last token.
123
124                {error, ErrorInfo, ErrorLocation}:
125                  An error occurred. ErrorLocation is the first location after
126                  the erroneous token.
127
128              string(String)   is   equivalent   to   string(String,  1),  and
129              string(String, StartLocation) is  equivalent  to  string(String,
130              StartLocation, []).
131
132              StartLocation  indicates  the  initial  location  when  scanning
133              starts. If StartLocation  is  a  line,  Anno,  EndLocation,  and
134              ErrorLocation  are  lines.  If StartLocation is a pair of a line
135              and a column, Anno takes the form of  an  opaque  compound  data
136              type,  and EndLocation and ErrorLocation are pairs of a line and
137              a column. The token annotations contain  information  about  the
138              column  and the line where the token begins, as well as the text
139              of the token (if option text is specified), all of which can  be
140              accessed by calling column/1, line/1, location/1, and text/1.
141
142              A  token is a tuple containing information about syntactic cate‐
143              gory, the token annotations, and the terminal symbol. For  punc‐
144              tuation  characters  (such  as  ; and |) and reserved words, the
145              category and the symbol coincide, and the token  is  represented
146              by a two-tuple. Three-tuples have one of the following forms:
147
148                * {atom, Anno, atom()}
149
150                * {char, Anno, char()}
151
152                * {comment, Anno, string()}
153
154                * {float, Anno, float()}
155
156                * {integer, Anno, integer()}
157
158                * {var, Anno, atom()}
159
160                * {white_space, Anno, string()}
161
162              Valid options:
163
164                {reserved_word_fun, reserved_word_fun()}:
165                  A  callback  function  that  is  called when the scanner has
166                  found an unquoted atom. If the function  returns  true,  the
167                  unquoted  atom  itself becomes the category of the token. If
168                  the function returns false, atom becomes the category of the
169                  unquoted atom.
170
171                return_comments:
172                  Return comment tokens.
173
174                return_white_spaces:
175                  Return  white space tokens. By convention, a newline charac‐
176                  ter, if present, is always the first character of  the  text
177                  (there  cannot  be  more  than  one newline in a white space
178                  token).
179
180                return:
181                  Short for [return_comments, return_white_spaces].
182
183                text:
184                  Include the token text in the token annotation. The text  is
185                  the part of the input corresponding to the token.
186
187       symbol(Token) -> symbol()
188
189              Types:
190
191                 Token = token()
192
193              Returns the symbol of Token.
194
195       text(Token) -> erl_anno:text() | undefined
196
197              Types:
198
199                 Token = token()
200
201              Returns  the text of Token's collection of annotations. If there
202              is no text, undefined is returned.
203
204       tokens(Continuation, CharSpec, StartLocation) -> Return
205
206       tokens(Continuation, CharSpec, StartLocation, Options) -> Return
207
208              Types:
209
210                 Continuation = return_cont() | []
211                 CharSpec = char_spec()
212                 StartLocation = erl_anno:location()
213                 Options = options()
214                 Return =
215                     {done,
216                      Result :: tokens_result(),
217                      LeftOverChars :: char_spec()} |
218                     {more, Continuation1 :: return_cont()}
219                 char_spec() = string() | eof
220                 return_cont()
221                   An opaque continuation.
222
223              This is the re-entrant scanner,  which  scans  characters  until
224              either  a dot ('.' followed by a white space) or eof is reached.
225              It returns:
226
227                {done, Result, LeftOverChars}:
228                  Indicates that there is  sufficient  input  data  to  get  a
229                  result. Result is:
230
231                  {ok, Tokens, EndLocation}:
232                    The  scanning was successful. Tokens is the list of tokens
233                    including dot.
234
235                  {eof, EndLocation}:
236                    End of file was encountered before any more tokens.
237
238                  {error, ErrorInfo, EndLocation}:
239                    An error occurred. LeftOverChars is the remaining  charac‐
240                    ters of the input data, starting from EndLocation.
241
242                {more, Continuation1}:
243                  More  data  is  required  for building a term. Continuation1
244                  must be passed in a new call to tokens/3,4 when more data is
245                  available.
246
247              The  CharSpec  eof signals end of file. LeftOverChars then takes
248              the value eof as well.
249
250              tokens(Continuation, CharSpec, StartLocation) is  equivalent  to
251              tokens(Continuation, CharSpec, StartLocation, []).
252
253              For a description of the options, see string/3.
254

ERROR INFORMATION

256       ErrorInfo is the standard ErrorInfo structure that is returned from all
257       I/O modules. The format is as follows:
258
259       {ErrorLocation, Module, ErrorDescriptor}
260
261       A string describing the error is obtained with the following call:
262
263       Module:format_error(ErrorDescriptor)
264

NOTES

266       The continuation of the first call to the  re-entrant  input  functions
267       must  be  [].  For  a  complete description of how the re-entrant input
268       scheme works, see Armstrong, Virding and Williams: 'Concurrent Program‐
269       ming in Erlang', Chapter 13.
270

SEE ALSO

272       erl_anno(3), erl_parse(3), io(3)
273
274
275
276Ericsson AB                      stdlib 3.12.1                     erl_scan(3)
Impressum