1Lexing(3) OCaml library Lexing(3)
2
3
4
6 Lexing - The run-time library for lexers generated by ocamllex.
7
9 Module Lexing
10
12 Module Lexing
13 : sig end
14
15
16 The run-time library for lexers generated by ocamllex .
17
18
19
20
21
22
23
24 Positions
25 type position = {
26 pos_fname : string ;
27 pos_lnum : int ;
28 pos_bol : int ;
29 pos_cnum : int ;
30 }
31
32
33 A value of type position describes a point in a source file. pos_fname
34 is the file name; pos_lnum is the line number; pos_bol is the offset of
35 the beginning of the line (number of characters between the beginning
36 of the lexbuf and the beginning of the line); pos_cnum is the offset of
37 the position (number of characters between the beginning of the lexbuf
38 and the position). The difference between pos_cnum and pos_bol is the
39 character offset within the line (i.e. the column number, assuming each
40 character is one column wide).
41
42 See the documentation of type lexbuf for information about how the lex‐
43 ing engine will manage positions.
44
45
46
47 val dummy_pos : position
48
49 A value of type position , guaranteed to be different from any valid
50 position.
51
52
53
54
55 Lexer buffers
56 type lexbuf = {
57 refill_buff : lexbuf -> unit ;
58
59 mutable lex_buffer : bytes ;
60
61 mutable lex_buffer_len : int ;
62
63 mutable lex_abs_pos : int ;
64
65 mutable lex_start_pos : int ;
66
67 mutable lex_curr_pos : int ;
68
69 mutable lex_last_pos : int ;
70
71 mutable lex_last_action : int ;
72
73 mutable lex_eof_reached : bool ;
74
75 mutable lex_mem : int array ;
76
77 mutable lex_start_p : position ;
78
79 mutable lex_curr_p : position ;
80 }
81
82
83 The type of lexer buffers. A lexer buffer is the argument passed to the
84 scanning functions defined by the generated scanners. The lexer buffer
85 holds the current state of the scanner, plus a function to refill the
86 buffer from the input.
87
88 Lexers can optionally maintain the lex_curr_p and lex_start_p position
89 fields. This "position tracking" mode is the default, and it corre‐
90 sponds to passing ~with_position:true to functions that create lexer
91 buffers. In this mode, the lexing engine and lexer actions are co-re‐
92 sponsible for properly updating the position fields, as described in
93 the next paragraph. When the mode is explicitly disabled (with
94 ~with_position:false ), the lexing engine will not touch the position
95 fields and the lexer actions should be careful not to do it either; the
96 lex_curr_p and lex_start_p field will then always hold the dummy_pos
97 invalid position. Not tracking positions avoids allocations and memory
98 writes and can significantly improve the performance of the lexer in
99 contexts where lex_start_p and lex_curr_p are not needed.
100
101 Position tracking mode works as follows. At each token, the lexing en‐
102 gine will copy lex_curr_p to lex_start_p , then change the pos_cnum
103 field of lex_curr_p by updating it with the number of characters read
104 since the start of the lexbuf . The other fields are left unchanged by
105 the lexing engine. In order to keep them accurate, they must be ini‐
106 tialised before the first use of the lexbuf, and updated by the rele‐
107 vant lexer actions (i.e. at each end of line -- see also new_line ).
108
109
110
111 val from_channel : ?with_positions:bool -> in_channel -> lexbuf
112
113 Create a lexer buffer on the given input channel. Lexing.from_channel
114 inchan returns a lexer buffer which reads from the input channel inchan
115 , at the current reading position.
116
117
118
119 val from_string : ?with_positions:bool -> string -> lexbuf
120
121 Create a lexer buffer which reads from the given string. Reading starts
122 from the first character in the string. An end-of-input condition is
123 generated when the end of the string is reached.
124
125
126
127 val from_function : ?with_positions:bool -> (bytes -> int -> int) ->
128 lexbuf
129
130 Create a lexer buffer with the given function as its reading method.
131 When the scanner needs more characters, it will call the given func‐
132 tion, giving it a byte sequence s and a byte count n . The function
133 should put n bytes or fewer in s , starting at index 0, and return the
134 number of bytes provided. A return value of 0 means end of input.
135
136
137
138 val set_position : lexbuf -> position -> unit
139
140 Set the initial tracked input position for lexbuf to a custom value.
141 Ignores pos_fname . See Lexing.set_filename for changing this field.
142
143
144 Since 4.11
145
146
147
148 val set_filename : lexbuf -> string -> unit
149
150 Set filename in the initial tracked position to file in lexbuf .
151
152
153 Since 4.11
154
155
156
157 val with_positions : lexbuf -> bool
158
159 Tell whether the lexer buffer keeps track of position fields lex_curr_p
160 / lex_start_p , as determined by the corresponding optional argument
161 for functions that create lexer buffers (whose default value is true ).
162
163 When with_positions is false , lexer actions should not modify position
164 fields. Doing it nevertheless could re-enable the with_position mode
165 and degrade performances.
166
167
168
169
170 Functions for lexer semantic actions
171 The following functions can be called from the semantic actions of
172 lexer definitions (the ML code enclosed in braces that computes the
173 value returned by lexing functions). They give access to the character
174 string matched by the regular expression associated with the semantic
175 action. These functions must be applied to the argument lexbuf , which,
176 in the code generated by ocamllex , is bound to the lexer buffer passed
177 to the parsing function.
178
179 val lexeme : lexbuf -> string
180
181
182 Lexing.lexeme lexbuf returns the string matched by the regular expres‐
183 sion.
184
185
186
187 val lexeme_char : lexbuf -> int -> char
188
189
190 Lexing.lexeme_char lexbuf i returns character number i in the matched
191 string.
192
193
194
195 val lexeme_start : lexbuf -> int
196
197
198 Lexing.lexeme_start lexbuf returns the offset in the input stream of
199 the first character of the matched string. The first character of the
200 stream has offset 0.
201
202
203
204 val lexeme_end : lexbuf -> int
205
206
207 Lexing.lexeme_end lexbuf returns the offset in the input stream of the
208 character following the last character of the matched string. The first
209 character of the stream has offset 0.
210
211
212
213 val lexeme_start_p : lexbuf -> position
214
215 Like lexeme_start , but return a complete position instead of an off‐
216 set. When position tracking is disabled, the function returns
217 dummy_pos .
218
219
220
221 val lexeme_end_p : lexbuf -> position
222
223 Like lexeme_end , but return a complete position instead of an offset.
224 When position tracking is disabled, the function returns dummy_pos .
225
226
227
228 val new_line : lexbuf -> unit
229
230 Update the lex_curr_p field of the lexbuf to reflect the start of a new
231 line. You can call this function in the semantic action of the rule
232 that matches the end-of-line character. The function does nothing when
233 position tracking is disabled.
234
235
236 Since 3.11.0
237
238
239
240
241 Miscellaneous functions
242 val flush_input : lexbuf -> unit
243
244 Discard the contents of the buffer and reset the current position to 0.
245 The next use of the lexbuf will trigger a refill.
246
247
248
249
250
251OCamldoc 2023-01-23 Lexing(3)